The prevailing assumption in artificial intelligence is that intelligence lives in the cloud — that to be smart, a device must be connected. This paper argues the opposite will become true. As models mature and weight compression advances, the edge will not retreat into dumbness when disconnected. It will carry the full depth of human knowledge locally, refreshed not by a live wire but by periodic synchronization of model weights.
We propose a two-model edge architecture: a broadcast-synced sparse expert world model and an on-device augmented personal model with a built-in interpreter head. Combined, this architecture addresses three problems at once: intelligence that is genuinely personal, privacy that doesn't depend on trusting a corporation, and a fraction of the energy and infrastructure cost of cloud inference.
Every major AI deployment today rests on the same implicit bargain: give us your data, your queries, your context — and we will give you intelligence. The user sends their thoughts to a remote server. The server processes them against a vast model. The answer returns. The data stays.
This is not a design choice made for the user's benefit. It is a consequence of scale. The models are too large to run locally. The compute is too expensive to distribute. Every question you ask, every document you analyze, every private thought you process through an AI system passes through infrastructure owned by a handful of corporations.
The problem is not that cloud AI companies are untrustworthy. The problem is that trust should not be required.
What follows is a description of an architecture in which it is not.
This is not an argument against the cloud. The cloud trains better models than any device ever will. It handles tasks that require computation at a scale no edge device can match. What this paper argues is simpler: not everything needs to be the cloud. A device that carries its own intelligence — that knows the world and knows you, locally, privately, offline — is not a retreat from capability. It is an addition to it. The architecture described here works alongside cloud AI, not instead of it. The local model is the default. The cloud remains available when you choose it. The difference is that choice now belongs to you.
A language model's weights are not code in the traditional sense. They are the accumulated result of exposure to human knowledge — billions of parameters shaped by contact with text, reasoning, and the patterns of thought that underlie both. A model's weights are, in a meaningful sense, a compression of what humanity knows — not a database, not a search index, but a form that encodes understanding rather than storing facts.
The sparse expert architecture described in this paper is not speculative. As of 2026, it is the dominant pattern at the frontier — the majority of leading model releases use sparse mixture-of-experts designs precisely because activating only a subset of experts per query delivers frontier capability at a fraction of the compute cost. The architecture inherits this efficiency directly.
Architectures improve, quantization and distillation advance, and the weight footprint for any given capability level continues to shrink. Capable models will run at the edge. The only question is when — and what the resulting architecture looks like.
→ [1]
A model weight update is categorically different from a software update. When a device receives updated weights, it does not merely gain new procedures. It absorbs new understanding. Months of collective learning compress into a payload the device integrates at rest, offline, without a persistent connection.
This is the sync paradigm: periodic, compressed, offline-compatible transfer of world-knowledge into edge devices. Not streaming. Not querying. Absorbing. In practice, a typical weekly sync payload for a curated expert subset is under 50 MB compressed — version-delta only. It fits comfortably in a background sync over Wi-Fi or metered 5G, with zero personal data ever leaving the device.
The sync request itself is the only fingerprint — and a remarkably thin one. It carries no query content, no personal context, no conversational history. With trivial hardening — batched anonymous requests through a proxy, timing randomization, CDN-style distribution of popular expert bundles — even this signal becomes indistinguishable from population-level noise. The privacy guarantee holds all the way down to the infrastructure layer.
Connectivity becomes optional. Intelligence does not.
Weight synchronization alone is not enough. A device that knows what humanity knows still lacks personal context.
| Component | Reads World Model | Accesses Personal Adapters | Reaches Cloud | Receives from User |
|---|---|---|---|---|
| User | — | — | — | — |
| Augmented Personal Model | Query only | Read / Write | — | Yes |
| World Model | — | — | Sync pull only | — |
| Interpreter Head | Output only | Read only | — | — |
A sparse expert model carries the distilled knowledge of the frontier. Individual expert models are maintained as discrete units — both at the cloud level and on device — enabling selective synchronization matched to usage profile. The world model is pull-only: it receives weight syncs and responds to queries from the augmented personal model. It cannot initiate communication, cannot transmit, and has no access to personal data.
A small, efficient model runs continuously on-device atop a frozen base. It carries thin trainable low-rank adapter layers that encode you — your language patterns, your reasoning style, your domains of interest, your history.
Built directly into this model is a fixed, non-trainable interpreter head — a small cross-attention layer, frozen at compile time. This head is the only component that ever sees both the world model output and your personal adapter. It translates between the two independent embedding spaces without ever letting the models touch, then synthesizes the final response.
The entire personal delta — adapters plus interpreter head — remains just a few megabytes. It is updated incrementally through interaction and never leaves the device. The interpreter head lives inside the same hardware secure enclave as the personal model, making the attack surface extremely narrow by design.
The user interacts only with the augmented personal model. The world model never sees personal context. The personal model never exposes its internal state. All orchestration happens locally, inside a single secure component.
→ [2, 3, 4]
Privacy becomes a property of the system, not a feature of the product. It cannot be revoked by a policy update. It cannot be eroded by a business model change. It simply is.
A device with limited or outdated experts is not broken — it simply has known, expressible limits. The augmented personal model tracks exactly which weights are present, how current they are, and how well-matched they are to the query at hand.
Rather than hallucinating into gaps, the interpreter head surfaces a coverage score — a weighted combination of router activation confidence, expert freshness relative to the query domain, and embedding-space domain relevance. The result is a concrete honest signal: "Drawing from world knowledge current to March 2026 with 92% domain coverage for quantum error correction."
A user asks about the latest developments in quantum error correction given their prior work on surface codes. The system returns the coverage score, leans on personal adapters to tailor the response to their exact reasoning style and historical context, and surfaces uncertainty where it exists. No hallucination — just transparent, graceful degradation.
In degraded states it leans more heavily on what it knows about the user to fill gaps intelligently. The personal adapters don't degrade with the world model. They always reflect you.
Two core models, each with independent embedding spaces, coordinated by a built-in interpreter head inside the augmented personal model.
The interpreter head is a small, fixed, non-trainable cross-attention module — typically 4–8 heads, low dimensionality — frozen at compile time. It takes the world model's output tokens in their native embedding space and attends to them from the personal model's context tokens. Because it is frozen and read-only, no gradients ever flow back into either model during adapter training. It acts purely as a translator while preserving strict separation.
world_output = world_model(query_from_personal) # embedding space W
personal_ctx = personal_adapters(user_history) # embedding space P
translated = interpreter_head.cross_attention(
query=personal_ctx,
key=world_output,
value=world_output
)
final_response = personal_model.decode(translated)
The world model remains a sparse expert system that is pull-only and periodically refreshed via sync. The augmented personal model contains both the trainable low-rank adapters that encode the user and a fixed interpreter head that performs the cross-space synthesis. This head bears the full computational cost of translation while preserving strict separation: the world model never sees personal data, and the personal model never leaks its private state outward.
All other architectural benefits — selective sync, graceful degradation, per-expert quantization, and incremental on-device training — remain unchanged.
Neither component requires new invention. Sparse expert world models are already the dominant frontier architecture. Adapter-based on-device personalization with a frozen base is already in production — major device manufacturers ship exactly this pattern today. The contribution here is the combination: privacy-first, offline-resilient, locally intelligent.
→ [4, 7, 8]
The interpreter relationship described here may itself be more than a fixed mechanism — it may be emergent. Prior work in multi-agent reinforcement learning has shown that neural networks develop compressed, task-specific communication protocols unprompted when given a consistent interface and a signal that rewards successful coordination. The personal model learning to read the world model through observation may follow the same principle: given enough interaction, a high-bandwidth AI-native channel develops that is more efficient than anything a human designer would specify, and specific to this device, this user, this pair of models.
Two findings from the broader literature suggest this channel may be more learnable than expected. Large language models have been shown to develop functional organization strikingly similar to human brain fMRI activation patterns — consistent specialization emerging independently across models trained on similar architectures. And the human brain itself exhibits fractal organizational structure across scales, with growing evidence that sufficiently trained artificial networks converge toward similar self-similar structure under the pressure of efficient compression. If independently trained models develop analogous internal organization, the personal model is not learning to read an alien space — it is finding correspondence between two instances of the same underlying pattern.
Whether oscillatory synchronization between models — analogous to neural entrainment in biological systems — could accelerate the formation of this channel, and whether fractal scale-invariance makes it transferable across capability upgrades, are open research questions. The architecture creates the conditions. What emerges is worth measuring.
→ [5, 6, 9, 10, 11, 12, 13, 14, 15]
This architecture does not require a utopian leap. It is viable today at reduced capability, and becomes more capable as hardware advances. The weight complexity synced to a device scales with what that device can run. The architecture is constant across the capability curve — only the ceiling changes.
As edge chips improve, larger and more capable world model weights become syncable. As world models become more efficient through distillation and quantization, they fit on less capable hardware. Both curves accelerate each other. The transition is a ramp that builds itself.
In a sparse expert architecture, different expert weights can carry different precision profiles — frequently used experts at higher precision, rarely accessed fallbacks more aggressively quantized. The augmented personal model's usage patterns inform this profile over time — the device becomes progressively better calibrated to its owner without any external direction.
Today, a mid-range smartphone running a 3–7 billion parameter quantized world model delivers genuine general intelligence locally — summarization, reasoning, writing, analysis — across the domains its expert profile covers. The personal adapter set starts empty and becomes calibrated within days of normal use. The architecture is useful from day one and better by day thirty. At the other end of the curve, as dedicated edge AI chips mature and model compression continues, the gap between local and cloud capability narrows to the point where the distinction becomes about latency and privacy preference rather than raw intelligence. The transition does not require waiting for that end state. Throughout it, the cloud remains available for tasks that exceed local capability. Early adopters route to the cloud selectively. Later adopters rarely need to. The architecture accommodates both without changing.
The cloud model is not merely inefficient for physical AI. It is architecturally incompatible with it.
This architecture does not just accommodate physical AI. It is the architecture physical AI requires.
→ [18, 19, 20]
The current AI infrastructure buildout is premised on inference at scale — serving billions of queries continuously through cloud architecture. The ongoing energy cost of AI lives here, not in training. Moving inference to the edge eliminates this cost at the cloud level. The aggregate power consumption of edge devices is not zero — but the efficiency advantage of local inference over remote inference is so large that the net reduction is substantial. The ongoing energy cost of AI becomes a bounded problem rather than an open-ended one.
Beyond energy, the continuous internet infrastructure that exists primarily to shuttle queries to models and return answers becomes largely unnecessary for AI interaction. The infrastructure problem becomes one of occasional high-quality sync windows rather than permanent fat pipes — a categorically different and cheaper problem.
The mobile phone leapfrogged landline infrastructure across much of the developing world. This architecture enables a more complete leapfrog for AI and, ultimately, general connectivity. The only infrastructure requirement is the device itself and an occasional low-bandwidth sync window — deliverable by satellite, intermittent wifi, or peer-to-peer mesh networks sharing weight updates between devices.
Nations that have resisted cloud AI adoption because it meant routing citizens' data through foreign infrastructure gain an alternative that requires no such compromise. Intelligence becomes a local resource rather than a foreign service. A government can cut internet access. It cannot cut intelligence that is already local.
If inference moves to the edge permanently, the semiconductor industry's priorities shift. The device chip must run world model inference and the augmented personal model continuously, efficiently, and securely. Power efficiency, unified memory, neural processing unit integration, and hardware-level secure enclaves become the metrics that matter.
A new measure emerges to replace tokens-per-second: intelligence per milliwatt. Whoever defines and wins that metric defines the next era of semiconductors. Hardware secure enclaves for the augmented personal model become a durable competitive moat.
→ [16, 17]
The architecture maps naturally onto human cognition. The personal adapters correspond to episodic and autobiographical memory — individual, never shared wholesale. The world model corresponds to semantic memory — general knowledge of how things work. The augmented personal model corresponds to executive function — dynamically combining both in context.
These systems in the human mind do not merge. They are accessed and combined dynamically, in response to context. The architecture didn't invent this separation — it encodes something cognition already figured out.
The prevailing assumption in AI infrastructure is that intelligence scales with physical build-out. More compute means more capability. More capability means more data centers. More data centers mean more cooling, more power, more fiber, more buildings, more land, more water. This is not a temporary phase. It is the foundational logic of the cloud model — and its environmental consequences are structural outputs of that model, not incidental side effects.
This architecture inverts that assumption. Intelligence stops being something you build out and starts being something you compress inward.
The cloud model treats intelligence as a utility that must be generated centrally and transmitted. This architecture treats intelligence as a property that can be stored locally and updated periodically. The environmental difference between those two models is the difference between a power grid and a battery.
The public conversation about AI's energy footprint focuses heavily on training — the large, visible, one-time cost of building a model. But training is bounded and periodic. Inference is continuous, and it scales with every user on earth, every query, every day. Moving inference to the edge eliminates this ongoing cost at the cloud level. Training infrastructure remains — but training a model once and distributing its weights is categorically different from serving that model's outputs billions of times daily.
Data centers are large consumers of water, used primarily for cooling — a cost largely invisible in public accounting of AI's environmental impact. The water footprint of a single large data center can rival that of a small city. As construction accelerates globally, this burden falls on local watersheds and aquifers, often in regions already under stress, chosen precisely because land and power are cheap.
Cloud infrastructure does not distribute its environmental burden evenly. Data centers cluster near cheap power and water — which means specific regions, specific communities, and specific ecosystems absorb disproportionate costs. Rivers near data center clusters run warmer. Local grids are stressed. Distributed edge intelligence diffuses this burden across billions of devices that people are already powering. No community bears a concentrated environmental cost. No watershed is selected for proximity to cheap land.
The current dominant metric in AI hardware is tokens per second — optimized for cloud-scale inference serving. For the edge architecture, the right metric is different: intelligence per milliwatt — how much genuine reasoning capability can be delivered per unit of energy on a constrained device. This reorientation builds energy efficiency into the fundamental design objective of the next generation of AI hardware, rather than treating it as a secondary concern.
The history of energy suggests that distributed storage is more resilient, more equitable, and ultimately more efficient than centralized generation and transmission at scale. The same logic applies to intelligence.
→ [16, 17]
This paper is public domain. The ideas here belong to no one. They are offered to the commons in the same spirit that the foundational architectures of the internet were offered — not as products to be extracted from, but as gifts to a future that will build on them in ways their originators could not predict.
If something here is useful, use it. If something here is wrong, correct it publicly. If something here sparks a better idea, share that too.
This architecture is implementable today on any device with a neural processing unit and hardware secure enclave — Apple A/M-series with Secure Enclave, Qualcomm Snapdragon with Hexagon NPU and TrustZone, and equivalents. The world model runs on existing sparse-MoE runtimes. Personal adapters use on-device LoRA training loops. The interpreter head ships as a static frozen ONNX or CoreML module. Sync logic is a background asset pipeline with delta patching.
No new silicon or OS primitives are required. The architecture composes what already ships in 2026 consumer hardware. The only thing missing is the decision to build it this way.
When readers encounter the edge AI architecture described in Intelligence at the Edge of the Cloud, the most common instinct is to reach for federated learning as a comparison. The surface similarities are real: both keep raw data on device, both involve some relationship between device and server, both claim privacy properties. But the underlying privacy models are categorically different — not incrementally, not in degree, but in kind. This paper makes that distinction precise. Federated learning privatizes data while the server continues to learn from devices. The broadcast sync architecture eliminates the server's ability to learn from devices entirely. Those are different guarantees with different threat models, different attack surfaces, and different trust requirements. Understanding why matters both for evaluating the architecture and for knowing when each is the right choice.
This paper is a companion to: Horvat, J. (2026). Intelligence at the Edge of the Cloud: A Local Architecture for the AI Era. Public domain.
Federated learning is the right frame to reach for. It is the most prominent privacy-preserving machine learning paradigm, it involves devices and servers, it keeps raw training data local, and it has a substantial research literature behind it. When someone reads an architecture proposal that also involves devices, servers, and local data, federated learning is the natural reference point.
The comparison is not wrong. The two share a family resemblance. Both are responses to the same problem: personal data is valuable for training AI systems, but centralizing that data creates privacy risks and concentrations of power. Federated learning was designed to solve that problem. The broadcast sync architecture described here was designed to solve a different but related problem: personal data should never reach the server at all, not even in derivative form.
That difference — derivative form — is where the architectures diverge. Federated learning transmits gradients, not raw data, and treats the distinction as a privacy guarantee. The broadcast sync architecture transmits nothing from the personal model, and treats the absence of transmission as the guarantee. These are not the same claim.
Federated learning asks: can we learn from data without seeing the data? The broadcast sync architecture asks: can we build personal intelligence without the server learning anything at all?
Federated learning, introduced by McMahan et al. in 2017, is a distributed training protocol in which a central model is improved using data that never leaves client devices. Devices download the current model, train locally on their private data, compute gradient updates, and send those gradients — not the raw data — to the server. The server aggregates gradients from many devices, updates the central model, and distributes the improved model back. The cycle repeats.
This is a genuine privacy improvement over centralizing raw data. The training data never leaves the device. The server never stores personal records. From a legal and operational standpoint, federated learning is substantially better than collecting data centrally.
The protocol has three components that matter for privacy analysis: local training, gradient transmission, and server aggregation. The first is entirely private. The third happens server-side and involves no individual device data. The second — gradient transmission — is where the privacy question lives.
The critical difference is in step three of each protocol. Federated learning requires the device to transmit a gradient. The broadcast sync architecture has no step in which the device transmits anything about the user. The personal model's training signal never leaves the hardware boundary.
→ [1]
The federated learning privacy argument rests on the assumption that gradients are safe to share — that while raw data is sensitive, the mathematical derivatives of training are not. This assumption was overturned in 2019.
Zhu, Liu, and Han demonstrated that private training data can be reconstructed from shared gradients with high fidelity. Their attack — Deep Leakage from Gradients — works by optimizing a dummy input to match the observed gradient. Given the gradient and the model architecture, an attacker can recover the original training sample. For images, recovery is pixel-accurate. For text, recovery is token-accurate.
The attack proceeds as follows. An attacker with access to a client's gradient — which in federated learning includes the server itself — initializes a random dummy input and computes its gradient using the shared model architecture. The attacker then optimizes the dummy input by minimizing the distance between its gradient and the observed client gradient. As the optimization converges, the dummy input approaches the actual training data.
The result: gradients leak the data they were computed from. Not probabilistically, not in aggregate, but directly and specifically. The training data can be reconstructed from the gradient alone.
Subsequent work by Geiping et al. (2020) extended the attack to high-resolution images at larger batch sizes, making it practical in realistic federated learning deployments — not just in carefully constructed experiments. The vulnerability is not a theoretical edge case. It is an active research area precisely because the attack works.
Defenses exist — differential privacy noise, gradient compression, secure aggregation — but each involves tradeoffs. Differential privacy noise degrades model quality. Gradient compression loses information that may be useful for training. Secure aggregation adds computational overhead and cryptographic complexity. None eliminates the channel through which the attack flows. They make the attack harder; they do not remove the attack surface.
The federated learning privacy guarantee is: we do not collect your data, and we make it difficult to reconstruct your data from the derivative signal we do collect. The broadcast sync guarantee is: there is no derivative signal. There is nothing to reconstruct.
→ [2, 3]
The broadcast sync architecture does not modify federated learning's communication protocol to make it safer. It eliminates the communication protocol for the personal model entirely.
The personal model — the component that knows the user — trains through behavioral observation on-device. It observes how the user interacts: when they rephrase a query, when they accept a response and move on, when they push deeper. These signals update the low-rank adapter weights that encode the user's patterns. No gradient is computed with respect to an external objective. No update is transmitted. No server is involved in the personal model's improvement at any stage.
The world model — the component that knows the world — is improved entirely separately. It is trained on data that has no relationship to any specific user. Its weights are distributed via broadcast: a one-directional transmission with no request-response channel. The server broadcasts to every device. It does not know which devices received what. It does not know which users exist. There is nothing to aggregate, nothing to invert, nothing to subpoena.
The threat models are not just different in degree. They are structurally different. Federated learning's residual risk lives in a communication channel that is fundamental to its operation — you cannot remove gradient transmission and still have federated learning. The broadcast sync architecture's residual risk lives in the physical device — a local, hardware-bounded surface that requires proximity to exploit. One attack surface is remote and scalable. The other is local and resource-intensive.
→ [1, 2, 3, 4]
| Property | Federated Learning | Broadcast Sync Architecture |
|---|---|---|
| Raw data transmission | Never transmitted | Never transmitted |
| Gradient transmission | Required — fundamental to protocol | None — no transmission from personal model |
| Server learns from user | Yes — central model improves from aggregated gradients | No — world model trained on separate data entirely |
| Gradient inversion risk | Present — active attack vector, pixel-accurate reconstruction demonstrated | Eliminated — no gradient to invert |
| Server fingerprint | Yes — server knows which devices participated and when | No — broadcast has no request-response; server cannot identify receivers |
| Legal compellability | Server holds gradient history — subject to legal process | Server holds no user-derived data — nothing to compel |
| Trust requirement | Honest-but-curious server is a threat model concern | Server cannot learn from devices regardless of intent |
| Personal model improves from | Aggregated device gradients — cross-device learning | Behavioral observation — single device, no external signal |
| Residual attack surface | Remote, scalable — gradient interception over network | Local, physical — secure enclave on device |
The table makes the structural difference visible. Every row where federated learning has a residual risk traces back to the gradient transmission channel. Every row where the broadcast sync architecture has no risk traces back to the absence of that channel. The privacy guarantees are not on the same spectrum. They are different architectures for different threat models.
Federated learning's gradient transmission is not only a vulnerability. It is also a feature. By aggregating gradients from millions of devices, federated learning can improve a central model using patterns that exist across the user population — patterns that no individual device could observe alone. A keyboard that improves its predictions by learning from all users collectively is using federated learning's core capability. The privacy cost and the capability are inseparable.
The broadcast sync architecture gives this up entirely. The world model improves through centralized training on data that has no connection to individual users. The personal model improves through observation of a single user. There is no cross-device learning, no population-level signal, no capability that emerges from aggregating across users. What is gained in privacy is paid for in this capability.
Whether that tradeoff is worth making depends on what you are building. For a keyboard prediction model, federated learning's cross-device learning is the entire point — the model gets better precisely because it learns from everyone's typing patterns. For a personal intelligence system that knows deeply private context — health, finances, relationships, professional reasoning — the cross-device capability matters less and the privacy guarantee matters more. The architecture should match the threat model the application actually faces.
Federated learning is the right answer when improving a shared model from distributed data is the goal. The broadcast sync architecture is the right answer when the personal model must be completely sovereign and the world model can be trained on other data. These are different use cases, not competing solutions to the same problem.
The primary goal is improving a shared model from distributed data. The privacy requirement is stronger than raw data centralization but does not need to eliminate derivative signal transmission. The use case benefits from cross-device learning — population-level patterns improve the shared model meaningfully. The application can accept and mitigate gradient inversion risk through differential privacy or secure aggregation. Examples: shared keyboard prediction, medical image classification across hospitals, fraud detection across financial institutions.
The personal model handles deeply private context — health, finances, professional reasoning, relationships — where even derivative signal transmission is unacceptable. The application operates in environments where connectivity is unreliable or latency is critical. The world model can be trained on data that is not derived from individual users. Legal risk requires that the server hold nothing that can be compelled. The use case requires intelligence that is specifically calibrated to one person and must not contribute to a shared model. Examples: personal AI assistant, physical AI systems with operator-specific calibration, sovereign AI for national or organizational use.
One further difference deserves explicit attention. Federated learning generates a communication record — the server knows which devices participated in which training rounds and when. This record exists even when gradients are protected by secure aggregation. It constitutes a participation fingerprint: evidence that a device was present and active, even if the content of its contribution is hidden.
The broadcast sync architecture generates no such record. The server broadcasts without knowing who receives. There is no participation fingerprint. A device that syncs is indistinguishable from a device that does not, from the server's perspective. This matters in adversarial contexts — legal, national security, or personal safety — where even the fact of participation is sensitive.
→ [1, 2, 3, 4, 5]
The dominant metric for AI hardware performance is tokens per second — a supply-side measure optimized for cloud serving infrastructure. As inference moves from data centers to edge devices, tokens per second measures the wrong thing. It rewards architectures that excel at batch serving over high-bandwidth memory while ignoring the constraint that actually binds at the edge: power. This paper proposes Intelligence Per Milliwatt (IpMW) as the correct metric for the edge AI era, defines it formally as a composite of reasoning capability, personalization delta, and degradation honesty normalized to average inference power draw, and derives the architectural implications. We argue that IpMW is to edge AI what miles per gallon was to the automobile industry — a demand-side metric that, once standardized, restructures an entire industry's design priorities.
This paper is a companion to: Horvat, J. (2026). Intelligence at the Edge of the Cloud: A Local Architecture for the AI Era. Public domain.
Tokens per second became the dominant AI hardware benchmark because it measures the thing that matters most in cloud serving: throughput. A data center running inference for millions of users needs to move tokens fast. Every millisecond of latency at scale multiplies into real cost. Every token per second of throughput gained is revenue earned or infrastructure cost avoided. The metric is not arbitrary — it was the right measure for the problem that existed.
That problem is not the only problem. As inference moves permanently to edge devices — smartphones, wearables, embedded systems, physical AI platforms — the constraint that binds is no longer throughput. It is power. A device with 3,000 milliwatts of sustained power budget for AI inference cannot run a model that draws 8,000 milliwatts at peak, regardless of how many tokens per second it produces at that draw. Battery life, thermal envelope, and die area all impose hard limits that tokens per second ignores entirely.
Optimizing for tokens per second at the edge produces the wrong architecture. It rewards large batch processing, high memory bandwidth, and parallel activation — properties that favor server-class hardware. It penalizes the sparse activation patterns, aggressive quantization, and selective expert loading that make edge inference efficient. A chip that delivers 200 tokens per second at 4,000 milliwatts scores lower than one delivering 180 tokens per second at 800 milliwatts, even though the second chip delivers far more value per unit of energy spent.
Tokens per second is a supply-side metric. It measures what the hardware produces. The edge AI era needs a demand-side metric — one that measures what the user receives per watt spent.
→ [1, 2]
The distinction between supply-side and demand-side metrics matters more than it might appear. Supply-side metrics measure what a system can produce under favorable conditions — peak throughput, maximum bandwidth, theoretical TOPS. They are useful for comparing hardware capabilities in isolation. They are poor predictors of user value in deployment.
Demand-side metrics measure what the user actually receives per unit of resource consumed. Miles per gallon is the canonical example. Internal combustion engines were optimized for decades on horsepower — a supply-side metric measuring what the engine could produce. When fuel economy standards forced the industry to optimize on miles per gallon — a demand-side metric measuring value delivered per gallon consumed — the entire architecture of engine design changed. Direct injection, variable valve timing, cylinder deactivation, turbocharging for efficiency rather than power — all of these emerged from optimizing for the right metric.
The AI hardware industry is at an analogous inflection point. The data center era optimized on supply-side metrics because cloud economics rewarded throughput. The edge era changes the economics fundamentally. The user cares about whether the device is intelligent, whether it knows them specifically, and whether it lasts through a full day of use. None of those outcomes are predicted by tokens per second.
Intelligence Per Milliwatt is the miles per gallon of edge AI. It measures value delivered — intelligent, personalized, honest responses — per unit of energy consumed. Optimizing for it produces different architectural choices than optimizing for throughput, and those choices are the right ones for the edge deployment context.
A useful metric requires a precise definition. IpMW has two components: a numerator that measures intelligence delivered and a denominator that measures energy consumed. Both require careful specification.
The three-component numerator is deliberate. A metric that measured only reasoning capability would reward large models with high benchmark scores regardless of whether they are honest about their limits or calibrated to the user. A metric that measured only personalization delta would reward systems that over-fit to user history at the expense of general capability. Degradation honesty is the component most absent from existing edge AI benchmarks and most consequential for real deployment — a system that confidently hallucinates is actively harmful regardless of its average accuracy.
→ [3, 4]
Normalized against a reference cloud baseline, R measures whether the edge device can actually think — not just generate tokens, but produce responses that reflect genuine reasoning about the query. The normalization against a cloud baseline matters: it grounds the score in something meaningful rather than an arbitrary point scale. A device scoring R = 0.75 delivers reasoning quality at 75% of a capable cloud model on covered domains. That is a concrete, interpretable claim.
Importantly, R is measured only within the device's declared expert coverage domain. A device that covers quantum computing but not genomics should not be penalized on genomics tasks — it should score on what it claims to cover. This connects R to the coverage score: a device with an accurate coverage score and high R within that coverage is genuinely more useful than a device with high R but poor coverage prediction.
PΔ captures the value added by personal calibration. A device running only the base model has PΔ = 0. As the personal adapter learns the user's patterns, preferred reasoning styles, domain knowledge, and communication preferences, PΔ rises. A device with PΔ = 0.25 delivers responses 25% more useful on user-specific tasks than the same base model without calibration.
This component is what makes IpMW a metric for personal AI rather than just efficient AI. Two devices with identical R scores can have very different real-world value if one knows the user and the other does not. PΔ makes that difference visible in the benchmark.
DH is the most novel component and deserves the most careful definition. It measures the calibration of the system's uncertainty — specifically, how accurately the stated coverage score predicts actual performance when the system operates outside its well-covered domains.
A system that scores DH = 1.0 is perfectly calibrated: when it says 80% coverage, it achieves 80% accuracy on those tasks. A system that says 95% coverage but achieves 60% accuracy, or says 40% coverage but achieves 85% accuracy, scores poorly on DH. The goal is not maximum confidence — it is accurate confidence. A system that honestly says "I don't know" is more useful than one that confidently gives a wrong answer, and IpMW rewards that honesty explicitly.
→ [3, 5]
IpMW is not just a measurement tool. It is a design objective. The architectural choices that maximize IpMW are different from those that maximize tokens per second, and those differences tell us what edge AI hardware should look like.
The optimization map reveals something important: the architectural choices that maximize IpMW are precisely the choices that define the broadcast sync architecture described in the companion paper. Sparse MoE world models, usage-profile quantization, behavioral adapter training, coverage score computation, and a frozen interpreter head are not incidental features — they are the design choices that emerge from optimizing for the right metric.
This is not a coincidence. The architecture was designed for the edge deployment context, and IpMW is the metric that captures that context. Tokens per second would have produced a different architecture. The metric defines the design space.
→ [1, 2, 3, 6]
A metric is only as useful as its measurement methodology. IpMW requires a standardized benchmark suite to be comparable across devices and architectures. The following outlines what that suite should contain.
The benchmark has one deliberate complexity: PΔ requires a calibrated device. A freshly deployed device with empty adapters has PΔ = 0 by definition. Published benchmarks should specify the calibration state — typically 10, 50, or 100 hours of interaction — and report PΔ at each milestone. This makes the benchmark more complex but more honest: it reveals how quickly the device becomes useful for a specific user, not just how capable it is out of the box.
Reference hardware for normalization should be updated annually as the frontier advances. The 2026 reference baseline is a capable cloud model at full precision. As edge devices approach cloud parity in covered domains, R scores will naturally rise. The metric scales with the frontier rather than becoming obsolete as hardware improves.
→ [3, 4, 5, 8]
Metrics are not neutral. They are design forces. The metric an industry optimizes for determines what gets built, what gets funded, and what gets rewarded. Tokens per second built data centers. TOPS built NPUs. Neither metric built the device that knows you, runs offline, and lasts all day. Intelligence per milliwatt builds that device.
The historical precedent is instructive. Before fuel economy standards, automobile engines were optimized for horsepower because that was what manufacturers competed on and what buyers nominally wanted. The introduction of miles per gallon as a regulatory and consumer metric forced a rethinking of engine architecture from first principles. Technologies that had existed for decades became economically compelling overnight because the metric now rewarded them. The same dynamic is available to edge AI hardware.
In 2026, edge AI chips are already competing on efficiency — the industry consensus is that efficiency is king, and the winners will deliver the most AI per joule. But without a standardized metric that captures the full picture of what "intelligence" means at the edge — reasoning, personalization, and honest uncertainty — competition optimizes for partial proxies. TOPS is a partial proxy. Tokens per second is a partial proxy. IpMW is not a partial proxy. It measures what the user actually receives.
Whoever defines this metric first, and builds hardware that wins on it, defines the next generation of edge AI silicon. The metric creates the battleground. The battleground creates the winners. The winners shape what AI looks like for the next decade of devices.
The question is not whether AI will run at the edge. It will. The question is which metric defines what "better" means when it gets there. That question is still open. This paper proposes an answer.
→ [1, 2, 6, 7]
| Property | Tokens Per Second | Intelligence Per Milliwatt |
|---|---|---|
| Metric type | Supply-side Measures hardware output | Demand-side Measures user value received |
| Optimized for | Cloud Data center batch serving | Edge On-device personal inference |
| Power sensitivity | None — does not account for power draw | Central — power is the denominator |
| Personalization | Invisible — a generic model and a calibrated one score identically | Explicit — PΔ component captures personal calibration value |
| Uncertainty honesty | Invisible — hallucination and accurate response score the same | Explicit — DH component penalizes overconfident hallucination |
| Architectural reward | Full model activation, high memory bandwidth, large batch size | Sparse MoE, usage-profile quantization, behavioral adaptation |
| Binding constraint | Compute throughput and memory bandwidth | Power envelope and thermal budget |
| Measurement scope | Single inference pass, often NPU or GPU only | Full system across complete inference cycle including enclave |
| Scales with frontier | Requires recalibration as model sizes grow | Normalized to reference baseline — scales naturally |
| Analogy | Horsepower — measures engine output | Miles per gallon — measures value delivered per resource consumed |
Process engineering has spent sixty years developing formal methodologies for identifying and managing failure modes in complex systems where errors have physical consequences. The most widely used of these — Hazard and Operability Study, or HAZOP — is a structured, node-by-node analysis that asks what happens when each component of a system deviates from its design intent. The methodology produces a formal register with specified causes, consequences, safeguards, and required actions. That register has legal standing. It gets reviewed. When the system changes, it gets updated.
AI systems deployed in consequential contexts — physical AI, medical AI, infrastructure AI, any AI where the output affects the physical world — have no equivalent. Failure modes are discovered in production. Responses are improvised. Nobody decided how the system fails. This paper argues that HAZOP applies directly to AI systems, demonstrates the methodology on a specific architecture, and proposes a standard for what an AI HAZOP should produce.
This paper is a companion to: Horvat, J. (2026). Intelligence at the Edge of the Cloud: A Local Architecture for the AI Era. Public domain.
In process engineering, the question of how a system fails is not left to chance or discovered in production. It is decided — deliberately, formally, at the design stage — before a single valve is specified or a pipe is routed. A cooling water valve fails open because losing cooling is the catastrophic outcome. A fuel supply valve fails closed because uncontrolled fuel flow is the catastrophic outcome. The failure mode is a design decision, made by a process engineer, documented in the process design package, and enforced in hardware. The valve does not decide how to fail. The system designer decides.
This principle — that failure modes must be specified, not discovered — is the foundation of process safety. It emerged from catastrophic industrial accidents in the mid-twentieth century, most notably Flixborough in 1974, where a temporary pipe bypass failed under pressure, killing 28 people and destroying the plant. The accident investigation revealed that nobody had formally analyzed what would happen if the bypass failed. The consequences were not unknown — they were simply unasked.
AI systems in 2026 are in approximately the same position that chemical plants were in 1963, when ICI first developed the operability study methodology. Complex. Consequential. Widely deployed. And almost entirely without formal failure mode analysis. When a cloud-dependent AI system loses connectivity, what happens? When the model receives data it was not trained on? When latency exceeds acceptable bounds? In most deployments, nobody decided. The system does whatever it does, and the response is improvised.
Process engineering solved this problem sixty years ago. The methodology transfers directly. The only thing missing is the decision to apply it.
→ [1, 2]
A Hazard and Operability Study is a structured examination of a system, conducted by a multidisciplinary team, that systematically identifies all credible deviations from design intent and their consequences. It produces a formal register — the HAZOP register — that documents every identified hazard, its causes, consequences, existing safeguards, and required actions. The register is the deliverable. It has legal standing in regulated industries. It is updated when the system changes.
The methodology proceeds in three steps. First, the system is divided into nodes — discrete sections, each with a defined design intent. In process piping, nodes are typically pipe sections or vessels. In AI systems, nodes are functional components. Second, parameters relevant to each node are identified — in process systems these are physical quantities like flow, pressure, temperature, and composition. In AI systems the equivalents are data, signal, weights, context, connectivity, latency, and confidence. Third, guide words are applied systematically to each parameter to generate deviations. The guide words are the engine of the methodology.
| Guide Word | Definition | Process Example | AI Equivalent |
|---|---|---|---|
| No / None | Complete negation of design intent | No flow in a transfer line | No sync received · No data to model · No response generated |
| More | Quantitative increase beyond design | High pressure in vessel | More queries than capacity · More data than expected · Higher confidence than warranted |
| Less | Quantitative decrease below design | Low flow in cooling line | Less training data · Lower sync frequency · Reduced coverage |
| As Well As | Qualitative addition to design intent | Contamination in process stream | Poisoned weights alongside valid ones · Personal data in world model sync |
| Part Of | Qualitative reduction of design intent | Partial composition only | Incomplete sync · Partial adapter update · Truncated response |
| Reverse | Logical opposite of design intent | Reverse flow in pipe | Model transmits instead of receives · Adapter trains on wrong signal |
| Other Than | Complete substitution of design intent | Wrong chemical in line | Wrong model version synced · Query routed to wrong expert · Wrong user context applied |
Not every guide word applies to every parameter at every node. The methodology does not require exhaustive application — it requires systematic application of all credible deviations. A combination that produces no meaningful deviation is simply not recorded. The skill is in identifying which combinations matter, which requires domain knowledge of both the methodology and the system being analyzed.
→ [1, 3]
The translation from process parameters to AI parameters is direct. Each physical quantity in a process system has a functional equivalent in an AI system. The mapping is not metaphorical — it is structural. Both systems have inputs, throughputs, transformations, and outputs. Both have design intents that can be deviated from. Both have consequences when deviations occur.
| Process Parameter | AI Equivalent | Design Intent Example |
|---|---|---|
| Flow | Data / Signal / Query rate | Queries arrive at expected rate within model capacity |
| Pressure | Load / Demand / Inference queue depth | Inference queue depth remains within latency budget |
| Temperature | Latency / Thermal state of device | Response latency within acceptable bounds; device within thermal envelope |
| Composition | Model weights / Version / Data distribution | Weights are current, validated, and match expected architecture version |
| Level | Memory / Storage / Context window occupancy | Context window occupancy within model capacity; storage sufficient for adapters |
| Connectivity | Network / Sync channel availability | Sync channel available for scheduled weight updates |
| Reaction | Adapter training / Model update | Adapter weights update through behavioral observation within expected bounds |
| Signal | Confidence score / Coverage score / Output certainty | Coverage score accurately reflects actual domain coverage |
The node structure for an AI system follows the same logic as process piping — each node is a functional section where the design intent can be stated clearly enough to generate meaningful deviations. For the broadcast sync architecture analyzed in this paper, six nodes are defined: the sync channel, the world model, the personal adapters, the interpreter head, the coverage score mechanism, and the user interface. Each has a clear design intent. Each has parameters that can deviate. Each deviation has consequences that can be specified.
The following register applies the HAZOP methodology to the broadcast sync architecture described in Intelligence at the Edge of the Cloud. The register follows the standard nine-column format used in process industries: Node, Parameter, Guide Word, Deviation, Cause, Consequence, Existing Safeguards, Action Required, and Responsibility. Severity is indicated as H (High — safety or major function impact), M (Medium — degraded operation), or L (Low — minor impact, self-correcting).
Only credible deviations are recorded. Combinations that produce no meaningful consequence are omitted. The register is not exhaustive — a full production HAZOP would require a multi-day workshop with domain specialists. This register demonstrates the methodology and identifies the most significant hazards.
| Node | Parameter | Guide Word | Deviation | Cause | Consequence | Existing Safeguards | Action Required | Resp. |
|---|---|---|---|---|---|---|---|---|
| Node 1 · Sync Channel — Design intent: World model weights broadcast periodically from server to device; device retains relevant experts locally | ||||||||
| N1 | Connectivity | No | No sync received — extended offline period | Network unavailable; server offline; device in coverage gap | M World model knowledge becomes stale. Coverage score degrades over time. System continues to function on existing weights. | Coverage score timestamps experts; DH component of IpMW degrades gracefully. User is informed of sync date. | Define maximum acceptable staleness threshold. Specify behavior when threshold exceeded — should system reduce confidence scores automatically? | Arch. Design |
| N1 | Composition | As Well As | Malicious weights alongside valid weights in broadcast | Compromised sync server; supply chain attack on model weights; adversarial weight injection | H Corrupted world model behavior. Potential for targeted misinformation, capability degradation, or covert data exfiltration via model behavior. | Human-gated publication — weights require deliberate authorization before broadcast. No automatic write path from training to sync server. | Implement cryptographic signing of weight packages. Device verifies signature before integrating sync. Define process for emergency weight revocation. | Security / Ops |
| N1 | Composition | Part Of | Incomplete sync — partial expert package received | Network interruption mid-sync; storage limit reached on device; timeout during large update | L Expert partially updated. Router may direct queries to expert with inconsistent weight version. Minor capability degradation in affected domain. | Delta-patching protocol; version tracking per expert. Incomplete sync detected by checksum mismatch. | Specify atomic sync behavior — either full expert package integrates or none does. No partial expert state permitted. | Arch. Design |
| N1 | Flow | More | Sync frequency higher than intended — excessive update rate | Server misconfiguration; CDN cache invalidation loop; client retry storm | L Excessive bandwidth consumption. Battery drain from repeated sync activity. No safety consequence. | Client-side rate limiting on sync requests. Exponential backoff on retry. | Define maximum sync frequency. Implement server-side rate limit per device cohort. | Platform Eng. |
| Node 2 · World Model — Design intent: Sparse expert model responds to queries from augmented personal model; pull-only; no personal data access; no transmission | ||||||||
| N2 | Signal | Reverse | World model initiates communication rather than responding to queries | Implementation defect; compromised runtime; malicious weight injection enabling active behavior | H Fundamental breach of privacy architecture. World model could exfiltrate personal context if it can initiate outbound communication. | Architectural constraint — world model has no transmission interface by design. OS-level network policy blocks outbound requests from world model process. | Enforce OS-level network isolation for world model process. Include in security audit scope. Add runtime assertion that world model never opens a network socket. | Security |
| N2 | Flow | No | World model unresponsive — no output to interpreter head | Model crash; out-of-memory; thermal throttling; corrupt expert weights | M Augmented personal model cannot synthesize world knowledge. System falls back to personal adapters only. Response quality degrades to base adapter capability. | Coverage score detects world model unavailability. User informed of degraded state. Personal adapters continue functioning. | Define explicit fallback behavior when world model is unavailable. Specify whether system should respond from personal adapters alone or decline the query. | Arch. Design |
| N2 | Composition | Other Than | Wrong expert activated for query domain | Router miscalibration; query domain ambiguous; expert boundary overlap in training | M Response draws on incorrect domain knowledge. Coverage score may not detect mismatch if router is confidently wrong. Plausible but incorrect output. | Coverage score includes router activation confidence. Low confidence triggers uncertainty signal to user. | Add domain classification validation layer between router and expert activation. Specify minimum router confidence threshold below which coverage score is automatically reduced. | ML Design |
| Node 3 · Personal Adapters — Design intent: Low-rank adapter weights encode user patterns through behavioral observation only; never transmitted; updated incrementally on-device | ||||||||
| N3 | Flow | Reverse | Personal adapter weights transmitted off-device | Implementation defect; malicious application accessing adapter via OS vulnerability; secure enclave breach | H Complete privacy breach. Personal adapter contains behavioral fingerprint of user. Transmission exposes identity, patterns, and potentially sensitive context. | Hardware secure enclave isolates adapter weights. No API exposes raw adapter weights externally. OS-level access controls. | Include adapter transmission prevention in security audit. Add runtime assertion monitoring for any adapter data leaving enclave boundary. Third-party security verification. | Security |
| N3 | Reaction | More | Adapter over-trains — excessive adaptation to recent interaction pattern | Unusual interaction session; adversarial prompt sequence designed to manipulate adapter; rapid behavior change in user | M Adapter over-fits to recent session. Prior user context underweighted. System behaves inconsistently across sessions. Potential for adversarial manipulation of personal model behavior. | Incremental update rate limiting. Exponential decay on adapter update magnitude. | Define maximum adapter update magnitude per session. Specify drift detection mechanism — if adapter weights change beyond threshold in single session, flag for review and limit update rate. | ML Design |
| N3 | Reaction | No | Adapter fails to update — no behavioral learning occurs | Observation signal below update threshold; enclave compute unavailable; adapter storage full | L Personalization delta (PΔ) remains at zero. System functions as base model only. No safety consequence — base model remains fully functional. | Coverage score reports personalization state. User can be informed if adapter calibration is not progressing. | Define minimum interaction rate required for adapter update. Specify storage management policy for adapter weights. | Platform Eng. |
| Node 4 · Interpreter Head — Design intent: Fixed non-trainable cross-attention module translates between world model and personal model embedding spaces; frozen at compile time | ||||||||
| N4 | Composition | Other Than | Interpreter head version mismatch — compiled against different base model version than currently installed | Base model sync updates embedding space geometry; interpreter head not recompiled; version management failure | M Cross-attention keys and queries misaligned across embedding spaces. Synthesis quality degrades. Coverage score may not detect geometric mismatch. Subtle but pervasive response quality degradation. | Version locking between base model and interpreter head. Sync rejected if interpreter head version does not match base model version. | Treat interpreter head as tightly coupled to base model version. Any base model sync must trigger interpreter head recompilation or sync rejection. Define versioning protocol. | Arch. Design |
| N4 | Signal | No | Interpreter head produces no output — cross-attention fails | Null world model output passed to cross-attention; dimension mismatch; numerical instability | L Synthesis step fails. System falls back to personal model response without world model integration. Response quality degrades but system does not crash. | Null output detection in synthesis pipeline. Fallback to personal model response with coverage score indicating world model unavailable. | Implement explicit null-output handler in interpreter head. Specify fallback behavior. Add to integration test suite. | ML Design |
| Node 5 · Coverage Score — Design intent: Weighted combination of router confidence, expert freshness, and domain relevance produces accurate uncertainty signal surfaced to user | ||||||||
| N5 | Signal | More | Coverage score overestimates actual coverage — confident signal in low-coverage domain | Router overconfidence; calibration drift; distribution shift in query domain not detected by router | H User receives confident response in domain where model is unreliable. Hallucination presented as high-confidence output. Degrades DH component of IpMW to near zero. Most dangerous failure mode in the architecture. | None beyond router confidence — this is the primary open risk in the architecture. | Implement external calibration validation. Periodically test coverage score accuracy against known-answer queries in boundary domains. Alert if DH drops below threshold. This is the highest-priority action item in this register. | ML Design / QA |
| N5 | Signal | Less | Coverage score underestimates actual coverage — false modesty in well-covered domain | Router underconfidence; conservative calibration; stale freshness timestamp despite good expert quality | M User unnecessarily warned of uncertainty in domain where model is reliable. Erodes trust in the coverage score signal. System appears less capable than it is. | User experience degradation only. No safety consequence. | Include underconfidence detection in calibration testing. Tune freshness decay function to avoid excessive staleness penalty for high-quality experts. | ML Design |
| Node 6 · User Interface — Design intent: Query received from user; response and coverage score delivered to user; no personal context transmitted externally | ||||||||
| N6 | Flow | As Well As | Personal context transmitted alongside query to external service | Third-party application integration that logs queries; UI layer sending context to analytics endpoint; developer error in application built on architecture | H Privacy architecture bypassed at the application layer. Personal context reaches external infrastructure despite architectural protections in the model layer. | Architecture prevents model-layer transmission. Application layer is outside the core trust boundary. | Publish clear application developer guidelines prohibiting query logging with personal context. Define privacy API contract. Consider OS-level enforcement of query privacy for applications using the architecture. | Platform / Policy |
| N6 | Signal | No | Coverage score not surfaced to user — uncertainty signal suppressed | Application developer chooses not to display coverage score; UI implementation omits uncertainty display | M User receives confident-appearing response with no uncertainty signal. Degradation honesty benefit of architecture lost at presentation layer. Effectively equivalent to architectures without coverage score. | None — application layer decision. | Specify minimum coverage score display requirement in API contract. Applications that suppress coverage score below threshold should surface a generic uncertainty indicator. | Platform / Policy |
The register contains 14 credible deviations across 6 nodes, with severity ratings of High (5), Medium (7), and Low (2). The most significant finding — rated High severity with no existing safeguard — is the coverage score overconfidence scenario in Node 5. This is the architecture's primary open risk: a router that confidently directs queries to an unsuitable expert produces a confident-appearing response in a domain where the model is unreliable, and the coverage score fails to warn the user. This is not a flaw unique to this architecture. It is the central unsolved problem in AI uncertainty quantification, surfaced explicitly by the HAZOP methodology.
→ [4, 5]
The HAZOP register confirms that the broadcast sync architecture was designed with conservative failure in mind. The majority of identified deviations have existing safeguards — the architecture degrades gracefully rather than failing catastrophically. This is consistent with a design that treats offline operation and uncertainty as first-class properties rather than edge cases.
But the register also surfaces three genuine open questions that the architecture does not fully answer.
→ [4, 5, 6]
The most important principle in process safety is not the HAZOP itself. It is the principle that underlies it: failure modes must be decided, not discovered. In process engineering, how a system fails is a design decision made by a qualified engineer at the earliest stage of design, before hardware is specified. The valve does not decide how to fail. The process engineer decides, based on which failure mode is safest for that specific point in the system, at that specific operating condition, in that specific process context.
This decision is then documented, reviewed, challenged by a multidisciplinary team, and enforced in hardware. If the process changes, the failure mode analysis is revisited. There is always an answer to the question: who decided how this fails, and when?
AI systems have no equivalent. The question of how an AI system fails — what it does when connectivity is lost, when it receives out-of-distribution input, when its confidence is wrong, when the user acts on an incorrect response — is almost never asked at the design stage. It is answered in production, after something goes wrong, by whoever happens to be responsible at the time.
This is not a criticism of AI developers. It reflects the absence of a methodology. Process engineering did not develop conservative failure mode design through intuition — it developed it through catastrophic accidents and the formal frameworks those accidents inspired. The AI industry has not yet had its Flixborough. As AI systems become more physically consequential — controlling vehicles, managing medical devices, operating infrastructure — it will.
The question is not whether AI systems will fail. They will. The question is whether someone decided how they fail, before they did.
→ [1, 2, 6]
The following outlines what an AI HAZOP standard should require for consequential deployments. Consequential is defined as any AI system where the output directly influences a physical, medical, financial, or safety-critical decision.
This standard does not require new technology. It requires applying a methodology that has been refined over sixty years to a new class of system. The tools — guide words, node analysis, register format — are the same. The parameters are different. The discipline required is the same.
Process engineering did not invent the concept of conservative failure mode design. It systematized it. The AI industry needs to do the same, and it does not need to wait for a catastrophic accident to begin.
→ [1, 2, 3, 7]
Process engineering documents complex systems using Piping and Instrumentation Diagrams — drawings that show not just topology but behavior, failure modes, control loops, instrument tags, and version history. These properties are what make formal safety analysis possible. AI system architecture is currently documented with block diagrams that show topology and nothing else. As AI systems become more physically consequential, block diagrams are insufficient. This paper proposes an AI P&ID notation standard — a symbol library, tag numbering convention, and drawing format adapted from ANSI/ISA-5.1-2024 and ISO 10628 — that gives AI systems the documentation discipline that process engineering developed over seventy years. The standard is demonstrated on the broadcast sync architecture from the companion paper series, producing the first AI P&ID in the proposed notation.
This paper is a companion to: Horvat, J. (2026). Intelligence at the Edge of the Cloud: A Local Architecture for the AI Era. Public domain.
A process engineer handed a block diagram of a chemical plant would not be able to conduct a HAZOP, write an operating procedure, specify a control system, or approve the design for construction. A block diagram shows components and connections. It does not show what flows between components, what data types those connections carry, how each component fails, what instruments monitor what parameters, what interlocks prevent unsafe states, or what version of the design is being reviewed. It is a sketch, not a document.
AI system architecture documentation in 2026 is almost entirely block diagrams. Boxes connected by arrows, sometimes with labels. When teams use more sophisticated tools they produce UML diagrams or data flow diagrams — both of which are improvements, but neither of which captures the properties required for safety-critical system documentation. None of them show failure modes. None of them show what happens when a connection is lost. None of them have a tag numbering system that connects the diagram to a HAZOP register. None of them have formal version control in the engineering sense.
This is not a criticism of the people producing these diagrams. There is no standard. There are no conventions. Nobody has defined what a complete AI system diagram should contain. The result is that every team invents its own notation, diagrams are not comparable across organizations, and the formal safety analysis that requires a precise drawing cannot be done.
A block diagram tells you what a system is. A P&ID tells you how it behaves, how it fails, and who is responsible for each part of it. Consequential AI systems need the second kind of document.
| Property | Block Diagram | AI P&ID (Proposed) |
|---|---|---|
| Component topology | ✓ Shows components and connections | ✓ Shows components and connections |
| Data types on connections | ✗ Not specified | ✓ Data type, size, and direction annotated on every line |
| Failure modes | ✗ Not shown | ✓ Every component shows fail-safe, fail-operational, or fail-stop |
| Instrument tags | ✗ No tagging convention | ✓ ISA-adapted tag numbering for every instrument and control loop |
| Control loops | ✗ Not represented | ✓ Monitoring instruments, controllers, and final control elements shown |
| Security boundaries | ✗ Implied at best | ✓ Enclave boundaries shown as defined zones with access annotations |
| Version control | ✗ Informal or absent | ✓ Revision block, date, change history, management of change required |
| HAZOP support | ✗ Insufficient for node analysis | ✓ Node boundaries defined; parameters readable from drawing |
| Legal standing | ✗ Informal communication tool | ✓ Controlled document with formal change management |
→ [1, 2]
A Piping and Instrumentation Diagram, as defined by ANSI/ISA-5.1-2024 and ISO 10628, is a detailed drawing of a process system showing all equipment, piping, instrumentation, and control systems with sufficient detail to support design, construction, operation, and safety analysis. It is not a schematic. It is not a concept diagram. It is an engineering document.
The properties that make a P&ID useful for safety analysis are precisely defined by the standards. Equipment is shown with its tag number, size, material of construction, and design conditions. Pipe lines are shown with their size, schedule, material, insulation, and direction of flow. Instruments are shown with tag numbers following the ISA letter-based convention — a pressure indicator is PI, a flow controller is FC, a temperature transmitter is TT. Each tag includes a loop number that connects it to the instrument data sheet and the control system documentation.
Most importantly for safety analysis, every valve shows its failure mode — fail closed (FC), fail open (FO), or fail in last position (FL). Every instrument shows its action on signal failure. Every safety interlock is shown with its trip setpoint. Every relief device is shown with its set pressure. The drawing contains enough information to ask, for every component: what happens when this fails, and does the system respond safely?
The drawing also has a title block with revision history. Every change to a P&ID triggers a management of change process. The revision number, date, description of change, and responsible engineer are recorded. You cannot change a P&ID informally. The version history is part of the document.
→ [1, 2, 3]
The following symbol library defines the graphical elements required to document AI systems in the proposed notation. Symbols follow the ISA convention of using circles for instruments, boxes for equipment, and lines for connections, adapted for AI-specific element types. Where no process equivalent exists, new symbols are defined with rationale.
| Symbol | Tag | Element Name | Description | Failure Mode Annotation | ISA/Process Analog |
|---|---|---|---|---|---|
| Category 01 · Models — Primary processing elements | |||||
| WM-XXX | World Model | Sparse expert inference model. Pull-only. Receives weight syncs. Responds to queries from augmented personal model only. | FL — Fail last known weights. Continues serving from current weights if sync unavailable. | Process vessel / reactor — primary transformation element | |
| APM-XXX | Augmented Personal Model | On-device model with frozen base, personal adapters, and interpreter head. Inside secure enclave. Primary user-facing element. | FL — Fail last state. Responds from personal adapters if world model unavailable. | Process vessel with internal components — complex transformation element | |
| IH-XXX | Interpreter Head | Fixed non-trainable cross-attention module. Frozen at compile time. Translates between world model and personal model embedding spaces. Internal to APM. | FO — Fail open to personal model only. If cross-attention fails, synthesize from personal adapters without world model input. | Heat exchanger / translator — embedded transformation element | |
| Category 02 · Adapters — Trainable parameter sets | |||||
| PA-XXX | Personal Adapter Set | Low-rank adapter layers encoding user patterns. Trainable. Updated by behavioral observation only. Never transmitted. Inside secure enclave. | FL — Fail last state. Adapter weights frozen if training signal unavailable. System continues at current calibration level. | Control valve with positioner — trainable final control element | |
| Category 03 · Instruments — Monitoring and measurement elements | |||||
| CI-XXX | Coverage Indicator | Monitors and displays current domain coverage score. Reads router activation confidence, expert freshness, and domain relevance. Surfaces signal to user interface. | FL — Fail to last known coverage state. If coverage computation fails, display last valid score with staleness timestamp. | Level indicator / analyzer — passive readout instrument (ISA circle symbol) | |
| ST-XXX | Sync Transmitter | Monitors sync channel status — last sync timestamp, sync success/failure, bytes received. Transmits status signal to coverage indicator and system monitor. | FL — Fail last state. If sync monitoring fails, retain last known sync timestamp. | Flow transmitter / status transmitter — signal originating instrument (ISA circle) | |
| MA-XXX | Model Analyzer | Monitors world model version, expert inventory, and calibration state. Analogous to a process analyzer — reads composition of active expert set. | FL — Fail last inventory. If model analyzer fails, retain last known expert inventory list. | Composition analyzer — quality measurement instrument (ISA circle with A) | |
| Category 04 · Connections — Data flow lines | |||||
| — | Primary Data Flow | Main inference data path. Annotated with data type, approximate size, and direction. Heavy solid line. | N/A | Main process line — primary fluid flow | |
| — | Signal / Control Line | Instrument signal, control output, or monitoring connection. Dashed line. Used for coverage score output, monitoring signals, and control loops. | N/A | Instrument signal line — electrical or pneumatic signal | |
| — | Sync Channel | Weight broadcast channel from sync server to device. One-directional. Annotated with sync frequency, payload size, and anonymization method. | N/A — channel loss results in FL behavior on world model | Utility line — secondary service connection | |
| Category 05 · Boundaries — Security and system zones | |||||
| SE-XXX | Secure Enclave Boundary | Hardware secure enclave boundary. Dashed red rectangle. All elements inside this boundary are hardware-isolated. No data crosses boundary without explicit interface definition. | FC — Fail closed. If enclave is compromised, all contained elements cease operation. No degraded operation outside enclave boundary. | Battery limit / system boundary — physical plant boundary | |
| DB-XXX | Device Boundary | Physical device boundary. Light dashed rectangle. Defines what is on-device vs off-device. All primary inference operations should be within this boundary. | N/A — boundary definition only | Equipment boundary — physical equipment limit | |
The symbol library is not exhaustive. It defines the minimum set required to document the broadcast sync architecture. A full standard would include additional categories for attention mechanisms, embedding layers, training loops, and deployment infrastructure. The symbol set should be extended by the standards body that adopts it, following the ISA model of periodic revision with industry consensus.
→ [1, 2, 4]
The ISA tag numbering system uses a letter-based convention where the first letter identifies the measured variable, subsequent letters identify the function, and a loop number makes each tag unique. FIC045 is the Flow Indicating Controller in loop 045. The convention has been in use since 1949 and is universally understood by process engineers.
The proposed AI P&ID tag convention adapts this structure directly. The first letter identifies the AI element type, the second identifies its function, and a sequential number makes it unique within the system.
| Letter | Type / First Position | Function / Second Position | Example Tag |
|---|---|---|---|
| M | Model | — | WM-001 (World Model 001) |
| A | Adapter | — | PA-001 (Personal Adapter 001) |
| I | Interpreter | I (Indicator) | IH-001 (Interpreter Head 001) |
| C | Coverage | I (Indicator) · T (Transmitter) · C (Controller) | CI-001 (Coverage Indicator 001) |
| S | Sync | T (Transmitter) · V (Valve/Gate) · A (Alarm) | ST-001 (Sync Transmitter 001) |
| W | Weight | T (Transmitter) · I (Indicator) | WT-001 (Weight Transmitter 001) |
| E | Enclave | B (Boundary) | SE-001 (Secure Enclave 001) |
| D | Device | B (Boundary) | DB-001 (Device Boundary 001) |
| U | User Interface | I (Interface) | UI-001 (User Interface 001) |
| X | External System | I (Interface) · S (Server) | XS-001 (External Sync Server 001) |
The tag numbering convention enables the AI P&ID to connect to a HAZOP register exactly as a process P&ID does. Every row in the HAZOP register references a tag. Every action item in the register references a tag. When the register says "SA-001 — Sync Alarm does not activate when sync interval exceeded," both the register and the drawing refer to the same uniquely identified element. The documentation system is self-consistent.
→ [1, 3]
The following AI P&ID documents the broadcast sync architecture from Intelligence at the Edge of the Cloud using the proposed notation. This is the first AI P&ID produced in this notation. The drawing should be read in conjunction with the HAZOP register in the companion paper — every node in that register corresponds to a tagged element on this drawing.
The drawing shows elements that no block diagram of this architecture has previously shown: the sync transmitter ST-001 monitoring the broadcast channel, the sync alarm SA-001 that activates when sync is overdue, the model analyzer MA-001 reading the expert inventory, the coverage indicator CI-001 generating the coverage score signal, and the explicit failure mode annotation on every element. The secure enclave boundary SE-001 is a defined zone, not an implied property.
A process engineer reading this drawing can immediately identify the HAZOP nodes, the instrument loop numbers, and the failure modes. The drawing and the HAZOP register from the companion paper are designed to be used together — every node in that register has a corresponding tagged element on this drawing.
→ [4, 5]
A P&ID without version control is not a P&ID. It is a snapshot. The version control and change management requirements are as important as the notation itself — they are what give the document its legal and operational standing.
In process engineering, every change to a P&ID triggers a Management of Change process. The change is described, the affected nodes are identified, the HAZOP register is updated for affected rows, the change is reviewed by a qualified engineer, and the revision block on the drawing is updated with the revision number, date, description, and responsible engineer. You cannot make an informal change to a P&ID. The document history is part of the document.
| Trigger Event | Required Action | Owner |
|---|---|---|
| New model version synced — architecture change | Update WM-XXX tag with new version number. Review affected HAZOP rows. Re-check IH version compatibility. Issue new drawing revision. | ML Arch. |
| New deployment context — physical AI, medical, infrastructure | Full re-HAZOP of affected nodes. Update failure mode annotations for new consequence profile. New drawing revision with context annotation. | Safety Eng. |
| New external integration added to UI-XXX | Add integration to drawing with data flow type and direction. HAZOP Node 6 re-analysis for new interface. Update privacy boundary annotation. | Platform Eng. |
| Secure enclave hardware changed | Update SE-XXX annotation with new hardware specification. Re-validate failure mode assumptions for enclave boundary. Security audit triggered. | Security Eng. |
| Adapter training mechanism changed | Update PA-XXX with new training description. Re-HAZOP Node 3. Update failure mode annotation if drift or over-training behavior changes. | ML Design |
| Coverage score algorithm changed | Update CI-XXX with new algorithm description. Re-HAZOP Node 5 — coverage overconfidence is the highest-severity open risk. Calibration validation required before release. | ML Design / QA |
| Production incident with safety consequence | Immediate drawing review. Identify affected tag(s). Determine if drawing was accurate — if not, issue corrective revision. Trigger full HAZOP revalidation for affected nodes. | Safety Eng. |
The drawing title block on AI-PID-001 shows revision A. Every subsequent change to the architecture that affects the drawing triggers a new revision — B, C, D — with the date, change description, and responsible engineer recorded. The revision history is permanent. You cannot go back and un-document a change.
This is exactly what process engineering requires of its P&IDs, and for exactly the same reason: when something goes wrong, the first question is always "what did the drawing say, and was the system built to the drawing?" That question requires a drawing that was maintained as a controlled document, not a slide that was last updated eighteen months ago.
→ [2, 3, 6]
ANSI/ISA-5.1 was first published in 1949 as ISA Recommended Practice RP-5.1. It has been revised multiple times — 1984, 1992, 2009, 2022, 2024 — as the process industries evolved and new technologies required new symbols. The standard did not emerge from a single paper. It emerged from industry consensus over decades, with ISA as the standards body that organized and formalized that consensus.
The same path is available for AI P&ID notation. The symbols and conventions proposed in this paper are a starting point, not a finished standard. What is needed is a standards body willing to take it on, an industry working group willing to develop it, and a critical mass of organizations willing to adopt it.
The most natural home is ISA itself — the International Society of Automation already owns the process P&ID standard and has the expertise and infrastructure to extend it. An AI P&ID standard could be designated ISA-5.5 or ISA-5.X, positioned explicitly as an extension of the existing ISA-5.1 instrumentation and control standards family. The connection to existing standards is a strength — it means process engineers who already use ISA-5.1 can read an AI P&ID with minimal additional training.
Alternatively, IEEE, IEC, or a new AI-specific standards body could adopt the notation. What matters is not which body adopts it but that some body does — that the notation becomes standardized, versioned, and maintained as the technology evolves.
ISA-5.1 has been in use for seventy-five years because someone decided that a common notation was worth the effort of standardizing. AI systems will be deployed for at least as long. The time to define the notation is before the accidents, not after.
→ [1, 2, 7]
This paper is a companion to: Horvat, J. (2026). Intelligence at the Edge of the Cloud: A Local Architecture for the AI Era. Public domain.
The edge AI architecture proposed in the companion paper claims that personal adapter weights train through behavioral observation — without supervised labels, explicit feedback, or transmitted data. This paper closes that specification gap. We define the Behavioral Preference Signal (BPS): a set of implicit interaction events — query reformulation, session continuation depth, dwell time, and follow-up query structure — that together constitute a noisy but aggregate-reliable proxy for user preference. We propose a constrained low-rank gradient update rule derived from Direct Preference Optimization that uses BPS scores as implicit preference labels, apply drift prevention constraints to prevent over-fitting and adversarial manipulation, and define a three-tier validation design. The mechanism is trainable on-device, requires no server communication, produces no supervisory signal that could be intercepted or compelled, and is grounded in established preference learning theory.
The companion architecture paper makes a precise claim: personal adapter weights update through behavioral observation only — no supervised loss, no explicit feedback, no transmitted gradient. The user is never asked to rate anything. The adapters become calibrated through the natural shape of interaction.
That claim is architecturally motivated and privacy-correct. But it is underspecified. What exactly does the system observe? What constitutes a training signal? How does an observation translate into a weight update? What prevents the adapter from drifting toward a bad state if the observations are noisy or adversarially crafted?
These are not rhetorical questions. They are engineering requirements. An architecture that leaves them unanswered is a proposal, not a blueprint. This paper answers them.
The key insight is that behavioral observation is not a new idea — it is the implicit form of the preference learning that explicit methods like RLHF and DPO already do. The difference is who provides the preference signal. In RLHF, a human annotator compares two responses and clicks a button. In the mechanism proposed here, the user's natural interaction behavior provides the same information — not explicitly, not consistently, but reliably in aggregate. The system observes what the user does after receiving a response, and treats that behavior as implicit evidence about whether the response served the user's need.
The preference signal has always been in the interaction. Most systems just never looked for it.
→ [1, 2, 3]
The Behavioral Preference Signal (BPS) is a scalar score assigned to each query-response pair based on the user's next observable action. It is computed on-device, immediately after the action occurs, and is never transmitted. It serves as the implicit preference label that drives the adapter update.
The signal is drawn from four observable event types, each with a time window and a score:
| Event | Window | BPS Score | Rationale |
|---|---|---|---|
| Query reformulation — user rephrases the same intent | < 45 seconds of response | −1.0 | Unambiguous negative signal. The response did not serve the user's need. The reformulation reveals what the response missed — level, framing, or domain. |
| Deep continuation — follow-up query builds on response without restating | < 90 seconds of response | +1.0 | Strong positive signal. The user understood the response and extended it. The structure of the follow-up reveals what the response communicated successfully. |
| Session end after sustained engagement (> 3 exchanges) | After final response | +0.4 | Weak positive. Extended engagement without reformulation suggests the response pattern was broadly useful. Noisy — user may have simply run out of time. |
| Immediate session termination after response | < 15 seconds of response | −0.3 | Weak negative. Could indicate the response was complete and satisfying, or that the user gave up. Treated as weak negative only when combined with short session history. |
| No action — user reads response, session pauses | > 120 seconds inactivity | 0.0 | Neutral. Insufficient signal to determine preference. No adapter update triggered. |
The BPS is intentionally simple. More sophisticated signals — dwell time, scroll behavior, copy actions — could be incorporated but introduce complexity without proportional benefit. The two primary signals (reformulation and deep continuation) are sufficient to distinguish responses that served the user from those that did not, in aggregate, over many interactions.
Individual BPS observations are noisy. A user might reformulate because they changed their mind, not because the response failed. A user might continue a session despite a mediocre response because the task required it. The mechanism does not need individual observations to be accurate. It needs the aggregate signal over many interactions to be directionally correct — which empirical work on revealed preference theory suggests it will be, given sufficient interaction volume.
→ [2, 4, 5]
The BPS mechanism is not invented from whole cloth. It is a privacy-preserving instantiation of two well-established learning frameworks: Direct Preference Optimization and revealed preference theory from economics.
DPO (Rafailov et al., 2023) showed that RLHF's complex reward modeling pipeline can be replaced by a simple binary cross-entropy objective over preference pairs. Given two responses to the same query — one preferred, one dispreferred — DPO directly optimizes the policy to increase the relative probability of the preferred response. No reward model. No RL loop. Just a classification loss over pairs.
The BPS mechanism adapts DPO to the implicit, on-device setting. Instead of human annotators providing preference labels, the system derives implicit preference labels from behavioral events. A query-response pair that triggers deep continuation is the "preferred" response in the DPO sense. A pair that triggers reformulation is the "dispreferred" response. The update rule is a DPO-derived gradient step on the adapter weights — not the full model, only the thin low-rank delta — in the direction that increases the relative probability of continuation-triggering responses over reformulation-triggering ones.
The critical difference from standard DPO is the source of the preference signal and the scope of the update. Standard DPO updates the full model on explicit human labels aggregated at a server. The BPS mechanism updates only adapter weights on implicit behavioral labels observed entirely on-device. The mathematical structure is the same. The privacy properties are categorically different.
In economics, revealed preference theory holds that a person's true preferences can be inferred from their choices, even when those choices are not explicitly framed as preference statements. A consumer who repeatedly buys product A over product B reveals a preference for A, regardless of what they say when asked directly.
The BPS is an application of this principle to conversational AI interaction. A user who consistently continues sessions after responses of a certain type reveals a preference for that type, regardless of whether they would articulate it explicitly. The behavioral signal reveals the preference. The adapter learns from what the user does, not what the user says they want.
This grounding matters for two reasons. First, it provides a theoretical basis for why the aggregate signal is reliable even when individual observations are noisy — revealed preferences aggregate in the same direction as true preferences over sufficient data. Second, it connects the mechanism to a 70-year-old literature in economics and decision theory that has validated the approach in many domains.
→ [1, 3, 5, 6]
The update rule specifies how BPS scores translate into adapter weight changes. It has three components: the gradient direction, the magnitude constraint, and the recency weighting.
Pseudocode — On-device BPS adapter update loop (simplified for clarity)
for each interaction session:
events = observe_session(query_response_pairs)
scored = assign_bps(events) # Table from Section II
pairs = construct_preference_pairs(scored)
for (q, r_pos, r_neg, score) in pairs:
if abs(score) < threshold: # Skip near-zero signal
continue
grad = compute_bps_dpo_gradient(
adapter_weights,
q, r_pos, r_neg,
base_model_ref,
beta=0.1 # KL constraint coefficient
)
clipped = clip_by_norm(grad, delta_max)
w_t = recency_weight(event.timestamp)
adapter_weights += eta * clipped * w_t
# Drift check after each session
if adapter_norm_delta() > 0.15 * initial_norm:
eta *= 0.5 # Halve learning rate
log_warning("Drift threshold approached")
The pseudocode is intentionally implementable. Every variable is specified. The drift check is explicit. The base model reference is included. This is not a sketch — it is a complete algorithm that could be ported to a LoRA training loop on any device that supports gradient computation in the personal adapter layer.
→ [1, 7, 8]
The BPS mechanism has two vulnerability surfaces: natural drift from noisy signals, and adversarial manipulation by a user deliberately trying to steer the adapter in a particular direction.
Noisy individual observations accumulate. A sequence of unusual sessions — travel, illness, a deadline — might produce a cluster of atypical behavioral signals that push adapter weights away from the user's normal pattern. The recency decay partially addresses this: old signals fade. But the cumulative drift constraint is the primary safeguard. If adapter weights have moved more than 15% of their initial norm from the base model, the learning rate halves automatically. This acts like a governor — the adapter continues learning but cannot drift far enough from the base model to lose general capability.
The 15% threshold is a design parameter, not a derived constant. A tighter threshold (5–10%) produces a more conservative adapter that calibrates slowly. A looser threshold (20–30%) allows more personalization but increases the risk of capability loss. The right value depends on deployment context and should be empirically validated in the human pilot study described in Section VI.
A user who understands the mechanism could deliberately generate reformulation and continuation signals to steer the adapter. This is not a theoretical attack — it is a practical one that a technically sophisticated user could attempt.
The gradient clipping is the primary defense. A maximum norm constraint of 1% of adapter norm per update means that even a deliberate sequence of high-magnitude BPS events cannot move the adapter significantly in a single session. Sustained manipulation over many sessions would be needed to produce meaningful drift — and even then, the 15% cumulative norm constraint limits the achievable manipulation.
A sophisticated adversary with physical device access could in principle attempt sustained manipulation — but this is precisely the threat the hardware secure enclave is designed to address, reducing it from a remote software attack to a local physical one. The more important observation is that the attack surface is narrow. The adapter encodes the user's patterns, not external world knowledge. A user manipulating their own adapter produces an adapter that serves their manipulated preferences — which is arguably the correct behavior. The security concern is not that a user manipulates their own model, but that a malicious actor manipulates another user's model. That attack requires physical access to the device, which is the secure enclave's job to prevent.
→ [7, 8, 9]
The BPS mechanism is a proposal, not a proof. Three validation tiers — synthetic, simulation, and human pilot — are required to establish that the mechanism works as described. This section specifies each tier in enough detail that a research team could implement it directly.
Setup: Generate a dataset of synthetic interaction sequences with known preference structure. Each sequence consists of 50–100 query-response pairs drawn from a preference-consistent distribution — some user type always prefers technical depth, another always prefers brevity, a third prefers analogical reasoning.
Procedure: Run the BPS mechanism on each sequence. Assign BPS scores based on simulated behavioral events generated from the known preference structure. Apply the update rule. After N interactions, measure whether adapter weights have moved in the direction that would increase performance on held-out preference-consistent queries from the same distribution.
Success criterion: Adapter performance on held-out queries improves by at least 5% relative to base model on preference-consistent tasks. Adapter does not degrade on general capability tasks. Drift remains within bounds.
Setup: Use an existing aligned LLM (GPT-4 class or equivalent) as a simulated user with a defined preference profile. The simulated user has a fixed "true preference" — for example, it prefers responses that use structured enumeration over prose. The preference is known to the evaluator but not encoded in the adapter at the start.
Procedure: Run interactions between the adapter-equipped model and the simulated user. The simulated user generates realistic reformulation or continuation signals based on whether each response matches its known preference profile. Apply BPS updates after each interaction. After 50, 100, and 200 interactions, evaluate whether the adapter model now matches the simulated user's preference profile better than the base model.
Success criterion: At 200 interactions, adapter model response style matches simulated user preference on blind evaluation. Reformulation rate decreases monotonically over interactions. PΔ score (from the IpMW companion paper) reaches at least +0.15.
Setup: Recruit 8–12 participants with diverse professional backgrounds and interaction styles. Each participant uses the system as their primary AI assistant for two weeks. The system is split — half of participants use the BPS-updated adapter (treatment group), half use a frozen adapter that receives no updates (control group). Neither group is told which condition they are in.
Procedure: At days 1, 7, and 14, administer a blind preference evaluation: present each participant with 20 matched query-response pairs — one generated by their adapter model, one by the base model — and ask which they prefer without revealing the source. Record reformulation rates throughout the study period. At study end, interview participants about their experience.
Success criterion: Treatment group participants prefer their adapter model responses over base model responses at statistically significant rates by day 14. Treatment group reformulation rate decreases over the study period. Control group shows neither effect.
Tier 1 can be completed with no external resources beyond compute. Tier 2 requires API access to a capable LLM. Tier 3 requires ethical approval and participant recruitment. The tiers are designed to be sequential — positive results at each tier justify the investment in the next. A research team with access to Tier 1 validation could publish that result alone as a meaningful contribution to the privacy-preserving personalization literature.
→ [2, 3, 10]
This paper specifies the mechanism and validation design but does not close every question. The following open questions are honest gaps — not rhetorical — and should be treated as research directions rather than objections.
None of these open questions undermine the mechanism. They are the natural boundary of what a position paper can establish without empirical data. The validation design in Section VI is designed precisely to answer questions 01 through 04. The mechanism is specified with enough precision to run that validation today.
→ [2, 3, 10]