<?xml version="1.0" encoding="utf-8"?><?xml-stylesheet type="text/xsl" href="atom.xsl"?>
<feed xmlns="http://www.w3.org/2005/Atom">
    <id>https://jitendersharma.dev/insights</id>
    <title>Jitender Sharma Blog</title>
    <updated>2026-06-27T00:00:00.000Z</updated>
    <generator>https://github.com/jpmonette/feed</generator>
    <link rel="alternate" href="https://jitendersharma.dev/insights"/>
    <subtitle>Jitender Sharma Blog</subtitle>
    <icon>https://jitendersharma.dev/img/favicon.ico</icon>
    <entry>
        <title type="html"><![CDATA[RAG Is Not a Database]]></title>
        <id>https://jitendersharma.dev/insights/rag-is-not-a-database</id>
        <link href="https://jitendersharma.dev/insights/rag-is-not-a-database"/>
        <updated>2026-06-27T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[RAG is runtime context construction at query time, not a storage layer you bolt onto an LLM.]]></summary>
        <content type="html"><![CDATA[<br>
<p><img decoding="async" loading="lazy" alt="RAG Is Not a Database" src="https://jitendersharma.dev/assets/images/rag-is-not-a-database-bc5dc7100b7796197864c96336fd1f6c.png" width="1536" height="1024" class="img_ev3q"></p>
<p>A team ships RAG, passes the demo, and three weeks later a user retrieves a document they were never allowed to see. The vector store did its job. The architecture was never there to stop it.</p>
<p>I see the same root cause every time: teams ask which vector database to buy before they have defined what "retrieval" means in their system. That question assumes RAG is a data layer: ingest documents, embed them, query at runtime, paste chunks into a prompt. Storage solved, problem solved.</p>
<p>It is not. A vector index is one component in a <strong>context construction pipeline</strong> that runs on every user request. Identity, freshness, ranking, abstention, and attribution all decide whether the model answers from evidence or invents from fluency. The database does not do that work. The architecture around it does.</p>
<p>This is an <strong>architecture breakdown</strong> of what RAG actually is in production.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>THE CLAIM</div><div class="admonitionContent_BuS1"><p><strong>RAG is not a database. It is runtime context construction:</strong> a governed pipeline that assembles the right evidence, for the right principal, at query time, before inference begins.</p><p>Treating RAG as storage leads teams to optimize embedding models and chunk sizes while skipping the layers that decide whether the answer is grounded: who may see which documents, which chunks survive ranking, and what happens when retrieval returns nothing worth citing.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-the-database-mental-model-fails">Why the database mental model fails<a href="https://jitendersharma.dev/insights/rag-is-not-a-database#why-the-database-mental-model-fails" class="hash-link" aria-label="Direct link to Why the database mental model fails" title="Direct link to Why the database mental model fails" translate="no">​</a></h2>
<p>The database framing is seductive because it maps to familiar CRUD workflows. Ingest PDFs. Chunk. Embed. Store. Query. Ship.</p>
<p>Production RAG does not look like that. At query time the system must:</p>
<ol>
<li class=""><strong>Scope retrieval to identity</strong> (not every user sees every chunk)</li>
<li class=""><strong>Retrieve candidates</strong> (often hybrid: lexical + vector + metadata filters)</li>
<li class=""><strong>Rank and filter</strong> (relevance is not cosine similarity alone)</li>
<li class=""><strong>Pack context</strong> (budget tokens, dedupe, attribute sources)</li>
<li class=""><strong>Decide whether to answer</strong> (abstain when evidence is thin)</li>
</ol>
<p>None of those steps live inside the vector store. The store holds vectors and metadata. The <strong>pipeline</strong> owns truth boundaries.</p>
<table><thead><tr><th>Database mental model</th><th>RAG as context construction</th></tr></thead><tbody><tr><td><strong>Primary job</strong></td><td>Persist and return stored records</td></tr><tr><td><strong>Success metric</strong></td><td>Query latency, index size</td></tr><tr><td><strong>Identity</strong></td><td>Often ignored until audit</td></tr><tr><td><strong>Failure mode</strong></td><td>Empty result set</td></tr><tr><td><strong>Ops focus</strong></td><td>Reindex when docs change</td></tr><tr><td><strong>Who owns quality</strong></td><td>Data engineering</td></tr></tbody></table>
<p>The gap shows up in regulated environments first. An auditor does not ask which vector DB you picked. They ask: <strong>who retrieved what, under which policy, and what did the model see?</strong> A database answer does not satisfy that question. A pipeline with identity-scoped retrieval, ranked context packs, and structured attribution does.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-actually-runs-at-query-time">What actually runs at query time<a href="https://jitendersharma.dev/insights/rag-is-not-a-database#what-actually-runs-at-query-time" class="hash-link" aria-label="Direct link to What actually runs at query time" title="Direct link to What actually runs at query time" translate="no">​</a></h2>
<p>RAG is not "fetch top-k chunks." It is a short-lived assembly line that produces a <strong>context pack</strong>: the bounded input the model is allowed to reason over.</p>
<!-- -->
<p>Four boundaries, one request:</p>
<ul>
<li class=""><strong>① Ingress:</strong> bind the question to a principal. Retrieval without identity is a data leak waiting for production traffic.</li>
<li class=""><strong>② Retrieval:</strong> candidate generation, not final context. Hybrid search and ACL filters shrink the candidate set before ranking spends compute.</li>
<li class=""><strong>③ Rank &amp; pack:</strong> re-ranking is where most quality wins hide. Token budgeting and deduplication turn "top-k blobs" into a coherent evidence pack.</li>
<li class=""><strong>④ Inference:</strong> the model reasons over the pack. Citation and abstention are system outcomes, not prompt wishes.</li>
</ul>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>The storage boundary</div><div class="admonitionContent_BuS1"><p><strong>The vector index stores candidates. It does not store truth.</strong></p><p>Truth is the outcome of the full pipeline: scoped retrieval, ranked evidence, attributed context, and an explicit decision to answer or abstain. Optimizing the index without designing these layers is how teams ship fluent wrong answers at scale.</p></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="demo-vs-production">Demo vs production<a href="https://jitendersharma.dev/insights/rag-is-not-a-database#demo-vs-production" class="hash-link" aria-label="Direct link to Demo vs production" title="Direct link to Demo vs production" translate="no">​</a></h3>
<table><thead><tr><th>Layer</th><th>Demo default</th><th>Production default</th></tr></thead><tbody><tr><td><strong>Identity</strong></td><td>Single shared index</td><td>Per-principal ACL on every retrieval path</td></tr><tr><td><strong>Retrieval</strong></td><td>Vector top-k</td><td>Hybrid search + metadata filters + freshness rules</td></tr><tr><td><strong>Ranking</strong></td><td>Skipped ("similarity is enough")</td><td>Re-ranker + score thresholds + dedupe</td></tr><tr><td><strong>Context pack</strong></td><td>Concatenate chunks</td><td>Token budget, source attribution, versioned templates</td></tr><tr><td><strong>Output</strong></td><td>Model free-text</td><td>Cite sources or abstain; log what entered the pack</td></tr><tr><td><strong>Change</strong></td><td>Re-embed when someone notices drift</td><td>Eval gate on index updates; replay for regulators</td></tr></tbody></table>
<p>The demo path works in a notebook. The production path is what survives the first compliance review.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-looks-like-when-it-breaks">What this looks like when it breaks<a href="https://jitendersharma.dev/insights/rag-is-not-a-database#what-this-looks-like-when-it-breaks" class="hash-link" aria-label="Direct link to What this looks like when it breaks" title="Direct link to What this looks like when it breaks" translate="no">​</a></h2>
<p>Teams living in the database mental model do not announce it. They ship features that look like RAG until production traffic arrives. Three symptoms show up first:</p>
<ul>
<li class=""><strong>Leakage.</strong> A user retrieves chunks from documents their role should never see. The vector store returned a valid result. The pipeline never bound retrieval to identity.</li>
<li class=""><strong>Confident wrong cites.</strong> The model answers with footnotes — and the sources do not support the claim. Cosine similarity passed; ranking and score thresholds never ran.</li>
<li class=""><strong>No replay story.</strong> An auditor asks what the model saw on March 12. The team has index stats and prompt logs, not the assembled context pack.</li>
</ul>
<p>Two failure modes get conflated: <strong>empty retrieval</strong> (nothing worth citing) and <strong>wrong retrieval</strong> (something plausible, not true). The first needs abstention. The second needs ranking, eval, and attribution. A database framing treats both as "bad query results." A pipeline framing treats them as distinct design problems.</p>
<p>Indexing is not where RAG quality is won. Teams spend months on chunking and embedding, then ship vector top-k at query time. Offline work is necessary. Scoped, ranked, attributable retrieval at query time is what production runs on.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-procurement-reframe">The procurement reframe<a href="https://jitendersharma.dev/insights/rag-is-not-a-database#the-procurement-reframe" class="hash-link" aria-label="Direct link to The procurement reframe" title="Direct link to The procurement reframe" translate="no">​</a></h2>
<p>Wrong question: "Which vector database?" Right questions:</p>
<ul>
<li class=""><strong>Identity:</strong> how does each retrieval path bind to the caller's claims?</li>
<li class=""><strong>Audit:</strong> what gets logged in the context pack for replay?</li>
<li class=""><strong>Abstention:</strong> when evidence falls below threshold, do you stop or guess?</li>
</ul>
<p>Freshness and scope answer to the pipeline too. Stale embeddings, document versions, who-may-see-what-today, sources spread across CRM, tickets, and policy engines: none of that lives in one datastore. Which is the whole point.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="where-i-actually-land">Where I actually land<a href="https://jitendersharma.dev/insights/rag-is-not-a-database#where-i-actually-land" class="hash-link" aria-label="Direct link to Where I actually land" title="Direct link to Where I actually land" translate="no">​</a></h2>
<p>I'm not saying vector stores don't matter, or that chunking is optional. You need storage. You need indexing. The mistake is stopping there.</p>
<p>The teams that ship trustworthy RAG treat the index as <strong>input to a pipeline</strong>, not the product. They design identity binding, ranking thresholds, context-pack logging, and abstention before they debate embedding dimensions. Those are the layers an auditor, a regulator, and a customer who acted on a wrong answer will actually hold you to.</p>
<p>Stop asking "which vector database?" Start asking <strong>"what assembles evidence for this principal, on this request, and what do we do when that assembly fails?"</strong></p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p><strong>RAG is not a database. It is runtime context construction</strong> scoped to identity, ranked for relevance, packed for the model, and auditable end to end.</p><p>In a demo, retrieval is a query. In production, retrieval is architecture.</p></div></div>]]></content>
        <author>
            <name>Jitender Sharma</name>
            <uri>https://jitendersharma.dev</uri>
        </author>
        <category label="AI & Intelligence" term="AI & Intelligence"/>
        <category label="Architecture" term="Architecture"/>
        <category label="RAG" term="RAG"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Policy-Governed Agent Runtime]]></title>
        <id>https://jitendersharma.dev/insights/policy-governed-agent-runtime</id>
        <link href="https://jitendersharma.dev/insights/policy-governed-agent-runtime"/>
        <updated>2026-06-25T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Proposal is not permission. Agents propose tool calls; governance decides whether they run. An architecture breakdown of runtime trust boundaries for production agent systems in regulated industries.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="Policy-Governed Agent Runtime" src="https://jitendersharma.dev/assets/images/policy-governed-agent-runtime-eb7b48cfee4807c1575083320746a743.png" width="1402" height="1122" class="img_ev3q"></p>
<p>Enterprise teams, especially in <strong>banking and other regulated industries</strong>, are connecting agents to operational tools: payment rails, core banking APIs, KYC workflows, trade settlement. Most production designs still leave undefined where <strong>token</strong>, <strong>identity</strong>, and <strong>policy</strong> state live during execution. The failure mode is not "the model misbehaved." It is "we cannot prove who authorized what, with which policy, before money moved."</p>
<p>This is an <strong>architecture breakdown</strong> of runtime trust boundaries. The LLM operates on conversation and tool schemas only. The Identity Provider owns claims. The Policy Engine (PDP) returns verdicts. The Policy Enforcement Point (PEP) gates every tool invocation and forwards only what policy allows.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>THE CLAIM</div><div class="admonitionContent_BuS1"><p><strong>Proposal is not permission.</strong> An agent <strong>proposes</strong> tool calls. Governance <strong>decides</strong> whether they run. Policy in the prompt or the weights is not enforcement: it's a suggestion the model may ignore.</p><p>In a Policy-Governed Agent Runtime (PGAR), the token and policies stay out of the LLM. The model proposes. The PEP enforces. The PDP decides. Governance lives on the execution path, not in the system message: the same separation banks already enforce between a teller's screen and the authorization engine behind a wire transfer.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-whole-system-on-one-page">The whole system on one page<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#the-whole-system-on-one-page" class="hash-link" aria-label="Direct link to The whole system on one page" title="Direct link to The whole system on one page" translate="no">​</a></h2>
<p>Five trust boundaries. Token and policy never cross the LLM boundary (③).</p>
<ul>
<li class=""><strong>① Ingress</strong> (API Gateway + Identity Provider): receives the request, validates the token, issues claims</li>
<li class=""><strong>② Agentic App</strong>: holds the session and token; <strong>never</strong> sends either to the LLM</li>
<li class=""><strong>③ LLM</strong>: gets conversation + tool schemas only; proposes a tool call</li>
<li class=""><strong>④ Policy layer</strong> (PEP + PDP): receives the proposal + token; PDP returns a verdict</li>
<li class=""><strong>⑤ Downstream</strong> (Payment Hub): PEP calls only on <strong>Allow</strong>; service re-authorizes</li>
</ul>
<!-- -->
<p>Read it across those five boundaries: <strong>ingress → agentic app → LLM proposes → PEP asks PDP → downstream executes</strong>. Most agent security stops at ② and never builds a real ④ or ⑤ re-auth. The rest of this piece walks that path: why prompt guardrails and per-API auth fail, then one wire request traced end to end with the contracts underneath.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-prompt-guardrails-arent-authorization">Why prompt guardrails aren't authorization<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#why-prompt-guardrails-arent-authorization" class="hash-link" aria-label="Direct link to Why prompt guardrails aren't authorization" title="Direct link to Why prompt guardrails aren't authorization" translate="no">​</a></h2>
<p><strong>Proposal is not permission.</strong> Most production agents today are <strong>prompt-governed</strong>: rules in the system message, hope in the middle, tools at the end. That works until someone asks a regulator, a security team, or a compliance officer to explain <em>why</em> an agent initiated a $47,500 wire without four-eyes approval, or released a payment to a beneficiary that failed sanctions screening.</p>
<table><thead><tr><th></th><th>Prompt-based guardrails</th><th>PGAR</th></tr></thead><tbody><tr><td><strong>Where policy lives</strong></td><td>System prompt / fine-tuned behavior</td><td>PDP, enforced by PEP, outside the model</td></tr><tr><td><strong>Enforcement</strong></td><td>Probabilistic: model may comply</td><td>Deterministic. PEP blocks or allows on PDP verdict</td></tr><tr><td><strong>Token handling</strong></td><td>Often in context or env-injected</td><td>Agentic App + PEP only; never in LLM input</td></tr><tr><td><strong>Auditability</strong></td><td>"The model was told not to"</td><td>Structured PEP/PDP decision log per proposal</td></tr><tr><td><strong>Prompt injection resistance</strong></td><td>Weak: attacker rewrites the "rules"</td><td>Strong: attacker cannot see or rewrite PDP rules</td></tr><tr><td><strong>Failure mode</strong></td><td>Silent violation</td><td>Explicit deny or step-up</td></tr><tr><td><strong>Regulatory posture</strong></td><td>Hard to defend under model-risk or operational-resilience scrutiny</td><td>Verdict chain, policy version, and immutable audit: the artifacts examiners ask for</td></tr></tbody></table>
<p>In banking terms: prompt guardrails are like posting "do not exceed transaction limits" on the break-room wall. PGAR is the core authorization engine that actually holds or releases the payment.</p>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>AUTHORIZATION ≠ PROMPTING</div><div class="admonitionContent_BuS1"><p>Prompt guardrails shape behavior: tone, format, abstention. They are <strong>not</strong> a substitute for authorization. PGAR owns the layer guardrails were never built to hold.</p></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-per-api-authorization-isnt-enough">Why per-API authorization isn't enough<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#why-per-api-authorization-isnt-enough" class="hash-link" aria-label="Direct link to Why per-API authorization isn't enough" title="Direct link to Why per-API authorization isn't enough" translate="no">​</a></h3>
<p><em>We already authorize every REST call. Isn't that enough?</em></p>
<p>In microservices, deterministic code calls authorized APIs. Agents insert a <strong>probabilistic orchestrator</strong>: the LLM proposes tool calls (what, in what order, with what arguments) before any request leaves the runtime. Per-API auth decides whether <code>POST /wires</code> may run; it does not govern whether the <strong>agent should have proposed</strong> that wire for $47,500 without four-eyes attestation, or whether a multi-step chain (<code>lookup</code> → <code>validate</code> → <code>initiate</code>) satisfies compound policy across amount limits, sanctions context, and approval state. API access logs show that a call succeeded; they do not record <strong>which policy version allowed the proposal before side effects</strong>. PGAR does not replace downstream re-auth: Payment Hub still checks the token. The PEP governs the <strong>proposal mile</strong> between model output and API invocation, and writes the verdict chain examiners expect.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="prediction-vs-truth-on-the-execution-path">Prediction vs. truth on the execution path<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#prediction-vs-truth-on-the-execution-path" class="hash-link" aria-label="Direct link to Prediction vs. truth on the execution path" title="Direct link to Prediction vs. truth on the execution path" translate="no">​</a></h3>
<p>Regulated systems need both: in different places. The LLM is a <strong>predictor</strong>: it infers intent, sequences tool calls, and drafts user-facing language. That is appropriate work for a probabilistic model. <strong>Authorization is not prediction.</strong> Whether $47,500 exceeds a $25,000 limit, whether a beneficiary cleared sanctions, whether four-eyes attestation is present: these are boolean facts evaluated against policy, not continuations the model might get right most of the time.</p>
<table><thead><tr><th>Task</th><th>Who owns it</th><th>Why</th></tr></thead><tbody><tr><td>Parse "send wire to Acme for INV-8842"</td><td>LLM (proposal)</td><td>Intent and phrasing: prediction is fine</td></tr><tr><td>Decide if officer may initiate wire</td><td>PDP (verdict)</td><td>Entitlement: must be deterministic</td></tr><tr><td>Compare amount to <code>wire.auto_approved</code></td><td>PDP (verdict)</td><td>Limit check: arithmetic, not fluency</td></tr><tr><td>Screen beneficiary against sanctions</td><td>Payment Hub + PDP</td><td>External truth: the model has no source</td></tr><tr><td>Record who approved before funds move</td><td>PEP (audit)</td><td>Evidence: cannot be inferred</td></tr></tbody></table>
<p>Put limits, entitlements, and sanctions in the prompt and you have delegated <strong>truth to a predictor</strong>. PGAR keeps prediction upstream (what to propose) and truth on the execution path (whether it may run). Treating "the model usually respects the rules" as authorization evidence fails model-risk and operational-resilience review: not because the model is bad, but because examiners require <strong>replayable verdicts</strong>, not plausible behavior.</p>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>PREDICTION VS. TRUTH</div><div class="admonitionContent_BuS1"><p><strong>Intelligence in the LLM. Truth in the PDP.</strong> Never conflate proposal with permission on the path that moves money, data, or regulatory scope.
This is the same thesis as <a class="" href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem">"Hallucination" is a design problem</a>: reliability and control live in the <strong>system around the model</strong>. PGAR is what that looks like when the system needs to <strong>authorize actions</strong>, not just validate answers.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="corporate-wire-one-request-through-five-boundaries">Corporate wire: one request through five boundaries<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#corporate-wire-one-request-through-five-boundaries" class="hash-link" aria-label="Direct link to Corporate wire: one request through five boundaries" title="Direct link to Corporate wire: one request through five boundaries" translate="no">​</a></h2>
<p>The overview diagram shows five trust boundaries; the sequence below walks every hop inside them.</p>
<p>User says: <em>"Send $47,500 to Acme Supplies for invoice INV-8842: use our operating account."</em> The LLM sees three tool schemas. <code>lookup_beneficiary</code>, <code>validate_payment</code>, <code>initiate_wire</code>. With no authority attached. This request exercises all three: <strong>lookup</strong> the payee, <strong>validate</strong> the payment (limits, sanctions, cut-off), <strong>initiate</strong> the wire. The PDP watches three risk triggers the model never sees: <strong>amount above auto-approval limit</strong> (STEP-UP), <strong>scope or entitlement violation</strong> (DENY), and <strong>sanctions or high-risk corridor hit</strong> (DENY or STEP-UP).</p>
<p>When something goes wrong: or during a scheduled review: compliance and regulators do not ask "what did the model intend?" They ask:</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>WHAT EXAMINERS ASK</div><div class="admonitionContent_BuS1"><ul>
<li class=""><strong>Which policy version decided?</strong> Every PEP log must carry <code>pgar.payments.wire/v3</code> (or equivalent), not "the system prompt from Tuesday."</li>
<li class=""><strong>Was execution blocked until attestation?</strong> Proof that STEP-UP fired and ALLOW came only after supervisor four-eyes, not after the model "felt confident."</li>
<li class=""><strong>Can you replay the verdict chain without model logs?</strong> Subject, action, resource, context, verdict: immutable, before side effects. Chat transcripts are discovery; PEP/PDP records are evidence.</li>
</ul><p>Prompt-governed agents struggle on all three. PGAR is built to answer them by construction.</p></div></div>
<!-- -->
<p><strong>Step-up is a PDP verdict, not a model feature.</strong> The model was never wrong for proposing $47,500: it was never given the $25,000 auto-approval limit. The PEP surfaces STEP-UP, the Agentic App owns four-eyes approval UX, and only a subsequent <strong>Allow</strong> reaches the Payment Hub. That attestation is what lands in the compliance archive: who authorized the exception, against which policy version, before a single dollar moved.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="when-the-pdp-says-deny">When the PDP says DENY<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#when-the-pdp-says-deny" class="hash-link" aria-label="Direct link to When the PDP says DENY" title="Direct link to When the PDP says DENY" translate="no">​</a></h3>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>EXPLICIT DENY, NOT SILENT VIOLATION</div><div class="admonitionContent_BuS1"><p>The sequence above walks STEP-UP. The same architecture handles the case regulators care about most. App forwards <code>validate_payment</code> to the PEP. Payment Hub returns <code>sanctions_status: hit</code>. PEP asks the PDP; verdict is <strong>DENY</strong>. The flow stops: <code>initiate_wire</code> is never proposed to execution, no amount argument, no model override, no "we told it not to." The audit record shows DENY with policy version and redacted context <strong>before</strong> any funds move. Prompt-governed agents fail silently here: the model may still propose the wire, or explain around the block in fluent language, with no immutable evidence that authorization was refused. PGAR turns that into an explicit, replayable block: the failure mode AML and sanctions examiners expect when a control trips.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="from-diagram-to-contracts">From diagram to contracts<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#from-diagram-to-contracts" class="hash-link" aria-label="Direct link to From diagram to contracts" title="Direct link to From diagram to contracts" translate="no">​</a></h2>
<p>If you can answer examiner questions from PEP/PDP logs alone, you have PGAR. If you need the chat transcript, you don't. The sequence diagram is the story; these payloads are the contracts that make the verdict chain replayable.</p>
<p>Every PDP evaluation uses the same four-field shape: <strong>who</strong> (subject), <strong>what</strong> (action), <strong>on what</strong> (resource), <strong>under what conditions</strong> (context).</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-token-and-claims-stay-in-the-agentic-app">1. Token and claims stay in the Agentic App<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#1-token-and-claims-stay-in-the-agentic-app" class="hash-link" aria-label="Direct link to 1. Token and claims stay in the Agentic App" title="Direct link to 1. Token and claims stay in the Agentic App" translate="no">​</a></h3>
<p>The token never enters the LLM request. It stays in the session and attaches to every Agentic App → PEP → Payment Hub call.</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"token"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"eyJhbGciOiJSUzI1NiIs..."</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"claims"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"iss"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"https://idp.bank.example"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"sub"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"officer-123"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"email"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"jitender@bank.example"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"act"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"sub"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"officer-123"</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"sct"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"access"</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"roles"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"corporate_banking_officer"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"payments_initiator"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"emt_iat"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1718812800</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"emt_exp"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1718899200</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"emts"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"payments.lookup"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"payments.validate"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"payments.wire.initiate"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"limits"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"wire.auto_approved"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">25000</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"wire.above_requires"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"supervisor_four_eyes"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"portfolio_accounts"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"acct-operating-4412"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"iat"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1718812800</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"exp"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1718816400</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-what-the-llm-actually-sees">2. What the LLM actually sees<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#2-what-the-llm-actually-sees" class="hash-link" aria-label="Direct link to 2. What the LLM actually sees" title="Direct link to 2. What the LLM actually sees" translate="no">​</a></h3>
<p>This is the payload that crosses the Agentic App → LLM boundary. Notice what's missing.</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"messages"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"role"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"user"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"content"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"Send $47,500 to Acme Supplies for invoice INV-8842: use our operating account."</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"tools"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"name"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"lookup_beneficiary"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"parameters"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"payee_name"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"string"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"invoice_ref"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"string"</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"name"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"validate_payment"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"parameters"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"beneficiary_id"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"string"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"amount"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"number"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"source_account"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"string"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"reference"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"string"</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"name"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"initiate_wire"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"parameters"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"beneficiary_id"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"string"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"amount"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"number"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"source_account"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"string"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"reference"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"string"</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>
<p>No <code>Authorization</code> header. No <code>roles</code>, <code>emts</code>, or <code>limits</code>. No policy text.</p>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>THE PGAR TEST</div><div class="admonitionContent_BuS1"><p>If any of those appear in your LLM payload, you don't have PGAR. You have prompt governance.</p></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-what-the-pep-sends-to-the-pdp">3. What the PEP sends to the PDP<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#3-what-the-pep-sends-to-the-pdp" class="hash-link" aria-label="Direct link to 3. What the PEP sends to the PDP" title="Direct link to 3. What the PEP sends to the PDP" translate="no">​</a></h3>
<p>The PEP doesn't send natural language to the PDP. It maps session <strong>claims</strong> into <strong>subject</strong>, adds the tool proposal as <strong>action</strong>, <strong>resource</strong>, and <strong>context</strong>, and calls the PDP.</p>
<table><thead><tr><th>Field</th><th>Source</th><th>Carries</th></tr></thead><tbody><tr><td><strong>subject</strong></td><td>Session <code>claims</code> (see above)</td><td>Who: same identity, roles, entitlements, and limits held in the Agentic App</td></tr><tr><td><strong>action</strong></td><td>Tool proposal name</td><td>What. <code>lookup_beneficiary</code>, <code>validate_payment</code>, <code>initiate_wire</code></td></tr><tr><td><strong>resource</strong></td><td>Tool proposal target</td><td>On what: beneficiary, source account, wire payment</td></tr><tr><td><strong>context</strong></td><td>Proposal + runtime state</td><td>Conditions. <code>amount</code>, <code>sanctions_status</code>, <code>approval</code></td></tr></tbody></table>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"subject"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"iss"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"https://idp.bank.example"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"sub"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"officer-123"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"email"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"jitender@bank.example"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"act"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"sub"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"officer-123"</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"sct"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"> </span><span class="token property" style="color:#36acaa">"type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"access"</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"roles"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"corporate_banking_officer"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"payments_initiator"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"emt_iat"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1718812800</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"emt_exp"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1718899200</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"emts"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"payments.lookup"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"payments.validate"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"payments.wire.initiate"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"limits"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"wire.auto_approved"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">25000</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"wire.above_requires"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"supervisor_four_eyes"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"portfolio_accounts"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"acct-operating-4412"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"iat"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1718812800</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"exp"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">1718816400</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"action"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"initiate_wire"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"resource"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"type"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"wire_payment"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"beneficiary_id"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"bene-acme-441"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"source_account"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"acct-operating-4412"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"reference"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"INV-8842"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"context"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"amount"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token number" style="color:#36acaa">47500</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"sanctions_status"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"clear"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token property" style="color:#36acaa">"approval"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token null keyword" style="color:#00009f">null</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>
<p>After step-up, the same request returns with <code>context.approval</code> set to <code>{ "type": "supervisor_four_eyes", "attestation_id": "apr-9f2c" }</code>. And the PDP re-evaluates.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-policy-rules-three-verdicts-no-fourth-option">4. Policy rules: three verdicts, no fourth option<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#4-policy-rules-three-verdicts-no-fourth-option" class="hash-link" aria-label="Direct link to 4. Policy rules: three verdicts, no fourth option" title="Direct link to 4. Policy rules: three verdicts, no fourth option" translate="no">​</a></h3>
<p>The PDP runs one policy surface. Three outcomes only: <strong>ALLOW</strong>, <strong>DENY</strong>, <strong>STEP_UP</strong>. No "the model said it was fine."</p>
<div class="language-json codeBlockContainer_Ckt0 theme-code-block" style="--prism-color:#393A34;--prism-background-color:#f6f8fa"><div class="codeBlockContent_QJqH"><pre tabindex="0" class="prism-code language-json codeBlock_bY9V thin-scrollbar" style="color:#393A34;background-color:#f6f8fa"><code class="codeBlockLines_e6Vv"><div class="token-line" style="color:#393A34"><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"policy_id"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"pgar.payments.wire"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"default_decision"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"DENY"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token property" style="color:#36acaa">"rules"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"decision"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"DENY"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"when"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"action"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"initiate_wire"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"validate_payment"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"subject.emts.payments.wire.initiate"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">false</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"decision"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"DENY"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"when"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"action"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">[</span><span class="token string" style="color:#e3116c">"validate_payment"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"initiate_wire"</span><span class="token punctuation" style="color:#393A34">]</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"context.sanctions_status"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"hit"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"decision"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"ALLOW"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"when"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"action"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"lookup_beneficiary"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"subject.emts.payments.lookup"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"decision"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"ALLOW"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"when"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"action"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"validate_payment"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"subject.emts.payments.validate"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"context.sanctions_status"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"clear"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"decision"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"ALLOW"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"when"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"action"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"initiate_wire"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"context.amount.lte"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"subject.limits.wire.auto_approved"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"context.sanctions_status"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"clear"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"decision"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"STEP_UP"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"when"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"action"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"initiate_wire"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"context.amount.gt"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"subject.limits.wire.auto_approved"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"context.approval"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token null keyword" style="color:#00009f">null</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"decision"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"ALLOW"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token property" style="color:#36acaa">"when"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token punctuation" style="color:#393A34">{</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"action"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"initiate_wire"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"context.amount.gt"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"subject.limits.wire.auto_approved"</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"context.approval.present"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token boolean" style="color:#36acaa">true</span><span class="token punctuation" style="color:#393A34">,</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">        </span><span class="token property" style="color:#36acaa">"context.sanctions_status"</span><span class="token operator" style="color:#393A34">:</span><span class="token plain"> </span><span class="token string" style="color:#e3116c">"clear"</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">      </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">    </span><span class="token punctuation" style="color:#393A34">}</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain">  </span><span class="token punctuation" style="color:#393A34">]</span><span class="token plain"></span><br></div><div class="token-line" style="color:#393A34"><span class="token plain"></span><span class="token punctuation" style="color:#393A34">}</span><br></div></code></pre></div></div>
<p>OPA, Cedar, your IAM PDP, or an internal rules engine can implement this surface: the requirement is <strong>structured input, deterministic output</strong>, evaluated by the PDP, not natural-language policy in a system prompt.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-the-pep-structural-not-conventional">5. The PEP. Structural, not conventional<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#5-the-pep-structural-not-conventional" class="hash-link" aria-label="Direct link to 5. The PEP. Structural, not conventional" title="Direct link to 5. The PEP. Structural, not conventional" translate="no">​</a></h3>
<p>The PEP sits between <em>proposal</em> and <em>execution</em>. The Agentic App cannot call the Payment Hub directly: every path goes through the PEP, which runs the same four steps on every proposal:</p>
<ol>
<li class=""><strong>Receive the input</strong>: the tool proposal (<code>initiate_wire</code> to <code>bene-acme-441</code> for $47,500, no approval yet), the bearer token, and the subject's claims.</li>
<li class=""><strong>Assemble and ask the PDP</strong>: map proposal and claims into the subject/action/resource/context request and call the PDP. Here the PDP returns <strong>STEP_UP</strong>, reason <code>wire_above_auto_approved</code>.</li>
<li class=""><strong>Write the audit record</strong>: every verdict is logged with the subject, action, resource, redacted context, the policy version that decided it (<code>pgar.payments.wire/v3</code>), and the verdict itself: immutable, before any side effect.</li>
</ol>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>DECISION FIRST, EXECUTION SECOND</div><div class="admonitionContent_BuS1"><p>This is the record operational-resilience and model-risk reviewers expect: verdict logged before any side effect: no retroactive narrative.</p></div></div>
<ol start="4">
<li class=""><strong>Act on the verdict</strong>: only <strong>ALLOW</strong> reaches the Payment Hub; <strong>STEP_UP</strong> returns a step-up-required response to the Agentic App; <strong>DENY</strong> returns a refusal. In this case the PEP responds <em>not executed, step-up required</em>. The wire never touched the payment rail.</li>
</ol>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>STRUCTURAL ENFORCEMENT</div><div class="admonitionContent_BuS1"><p>If the Agentic App can call downstream services <strong>without</strong> passing through the PEP, you don't have enforcement: you have a suggestion. The choke point must be structural, not conventional.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-this-is-an-architecture-problem-not-a-sprint-item">Why this is an architecture problem, not a sprint item<a href="https://jitendersharma.dev/insights/policy-governed-agent-runtime#why-this-is-an-architecture-problem-not-a-sprint-item" class="hash-link" aria-label="Direct link to Why this is an architecture problem, not a sprint item" title="Direct link to Why this is an architecture problem, not a sprint item" translate="no">​</a></h2>
<p>You can buy an agent framework in an afternoon. You cannot buy the <strong>boundary decisions</strong> PGAR requires: those are architecture commitments someone will have to defend to security, finance, internal audit, and regulators.</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>WHO MUST DEFEND THIS</div><div class="admonitionContent_BuS1"><p>In a bank, that conversation happens with model-risk management, second-line compliance, and the teams who already own payment authorization: not only with the squad shipping the chatbot.</p></div></div>
<table><thead><tr><th>Engineering thinks…</th><th>Architecture decides…</th></tr></thead><tbody><tr><td>"Put the wire limit in the system prompt"</td><td>Where policy is evaluated. PDP, deterministically, on structured input at the PEP</td></tr><tr><td>"The model will learn to respect the rules"</td><td>What the LLM is allowed to see: schemas yes, credentials and entitlements no</td></tr><tr><td>"We'll add auth later"</td><td>Whether every path to downstream services goes through the PEP or some paths bypass it</td></tr><tr><td>"Identity is the IdP team's problem"</td><td>How claims flow to the PDP without ever reaching the LLM</td></tr><tr><td>"Logging is a nice-to-have"</td><td>Which PEP/PDP decisions are immutable audit events vs. sampled debug traces</td></tr><tr><td>"One team owns the agent"</td><td>Who owns the gateway, identity, policy, and service boundaries: four different stakeholders, often four different lines of defense in a regulated firm</td></tr></tbody></table>
<p>PGAR is the control surface for <em>actions</em> in a governed agent stack: intelligence stays in the LLM, control in the PEP + PDP, and every verdict is an audit-grade event, not a sampled trace.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p><strong>Proposal is not permission.</strong> The LLM proposes. The Agentic App holds the token. The PDP decides. The PEP enforces. Downstream services re-authorize. That is PGAR: governance as architecture, not as a paragraph in the system prompt.</p><p>If the Agentic App can reach downstream without the PEP, you have a demo, not governed production.</p></div></div>]]></content>
        <author>
            <name>Jitender Sharma</name>
            <uri>https://jitendersharma.dev</uri>
        </author>
        <category label="Governance & Trust" term="Governance & Trust"/>
        <category label="Architecture" term="Architecture"/>
        <category label="Policy" term="Policy"/>
        <category label="Compliance" term="Compliance"/>
        <category label="Agents" term="Agents"/>
        <category label="LLM" term="LLM"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[AI Observability In Enterprise]]></title>
        <id>https://jitendersharma.dev/insights/ai-observability-in-enterprise</id>
        <link href="https://jitendersharma.dev/insights/ai-observability-in-enterprise"/>
        <updated>2026-06-18T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[AI observability is not a dashboard. It is a capture-and-retention architecture with five signals, five retention policies, and four consumers.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="AI Observability In Enterprise" src="https://jitendersharma.dev/assets/images/ai-observability-in-enterprise-55eeff32f54218da0d324483661c8048.png" width="1536" height="1024" class="img_ev3q"></p>
<p>Everyone says "monitor your AI in production". Almost nobody draws the system that does it. "Add Observability" is a slogan until you can say <strong>exactly what gets captured, where it lands, how long it lives, and who reads it.</strong></p>
<p>This is an <strong>architecture breakdown</strong> - capture in the request path, fan-out into purpose-built storage tiers, and four very different consumers reading off them. The headline: AI observability isn't one thing. It's <strong>five signals with five retention policies feeding four jobs</strong>, and the regulator-facing ones look nothing like the dashboard-facing ones.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>THE CLAIM</div><div class="admonitionContent_BuS1"><p>AI observability is not "a dashboard". It's a <strong>capture-and-retention architecture</strong>: each signal (logs, metrics, traces, raw prompts, audit records) has a different consumer, a different retention window, and a different blast radius if you get it wrong.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-whole-system-on-one-page">The whole system on one page<a href="https://jitendersharma.dev/insights/ai-observability-in-enterprise#the-whole-system-on-one-page" class="hash-link" aria-label="Direct link to The whole system on one page" title="Direct link to The whole system on one page" translate="no">​</a></h2>
<!-- -->
<p>Read it left to right: <strong>capture -&gt; store -&gt; consumer</strong>. The rest of this piece is just the reasoning behind each arrow.</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>This isn't only for AI</div><div class="admonitionContent_BuS1"><p>The <code>capture-&gt;store-&gt;consume</code> backbone here isn't AI-specific. Swap the <strong>Agentic app/ RAG service</strong> node for a microservice, a VM-hosted app, or a cots product and the skeleton is unchanged: emit OTel signals, fan them out to tiers wit deliberate retention, feed operational / SLO/ audit consumers. Only <strong>two boxes are the AI-specific part</strong>,
the <em>raw prompt/response</em> store and the <em>drift detector</em>. Drop those and you're left with a perfectly standard service-observability architecture. So you don't need a different observability sta for non-agentic systems, you just need fewer arrows the same one.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-capture-lives-in-the-request-path---and-thats-the-hard-constraint">1. Capture lives in the request path - and that's the hard constraint<a href="https://jitendersharma.dev/insights/ai-observability-in-enterprise#1-capture-lives-in-the-request-path---and-thats-the-hard-constraint" class="hash-link" aria-label="Direct link to 1. Capture lives in the request path - and that's the hard constraint" title="Direct link to 1. Capture lives in the request path - and that's the hard constraint" translate="no">​</a></h2>
<p>The app: an agent, a RAG service, any LLM system: emits <strong>five signals</strong> through an OTel SDK into an <strong>OTel collector</strong> on the hot path: <strong>logs, metrics, traces</strong> (standard OpenTelemetry) plus <strong>raw prompt/response</strong> and <strong>audit records</strong> (governed, AI-specific). Two design consequences fall out immediately:</p>
<ul>
<li class=""><strong>Instrumentation is not free.</strong> Every signal you emit costs latency and money on the request path. That's why the boring signals (metrics) are cheap and always-on, while the expensive ones (traces, raw payloads) are <strong>sampled</strong> or <strong>gated</strong>.</li>
<li class=""><strong>The Collector is the control point.</strong> Routing, sampling, redaction, and fan-out happen <em>once</em>, in the Collector - not scattered across app code. This is where you strip PII before it every reaches a long-lived store.</li>
</ul>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>Using vendor neutral <strong>OpenTelemetry</strong> at the capture layer is the decision that keeps your backwards swappable. The signals are standardized; where they land is a routing config, not a rewrite.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-five-signal-five-storage-tiers-five-retention-policies">2. Five Signal, Five storage tiers, five retention policies<a href="https://jitendersharma.dev/insights/ai-observability-in-enterprise#2-five-signal-five-storage-tiers-five-retention-policies" class="hash-link" aria-label="Direct link to 2. Five Signal, Five storage tiers, five retention policies" title="Direct link to 2. Five Signal, Five storage tiers, five retention policies" translate="no">​</a></h2>
<p>This is the part most "monitoring" setups collapse into on bucket - and it's exactly where AI system's differ from ordinary services. <strong>Retention is a governance decision, not a storage default</strong>.</p>
<table><thead><tr><th><strong>Signal</strong></th><th><strong>Store</strong></th><th><strong>Retention</strong></th><th><strong>Why this window</strong></th></tr></thead><tbody><tr><td><strong>Structured Logs</strong></td><td>Log store</td><td><strong>30 d</strong></td><td>Operational debugging; cheap to keep short, noisy to keep long</td></tr><tr><td><strong>Metrics</strong></td><td>Time Series DB (TSDB)</td><td><strong>13 mo</strong></td><td>Trend + year-over-year comparison, tiny per-point cost</td></tr><tr><td><strong>Sampled Traces</strong></td><td>Trace store</td><td><strong>30 d</strong></td><td>Latency/causality debugging; full traces are expensive, so sample</td></tr><tr><td><strong>Raw prompt/response</strong></td><td>Restricted store</td><td><strong>encrypted, 90 d</strong></td><td>Sensitive content: quality/drift analysis, tightly access-controlled</td></tr><tr><td><strong>Audit record</strong></td><td>Audit log</td><td><strong>immutable, 7 y</strong></td><td>Compliance evidence: must survive, must not be editable</td></tr></tbody></table>
<p>The two dotted arrows in the diagram matter. <strong>Raw prompt/response</strong> and <strong>audit records</strong> are not routine telemetry - they are <strong>sensitive, governed</strong> signals. One is encrypted and short-lived; the other is immutable and kept for years. Treating either like a normal log is how you end up with PII in a debug dashboard or a compliance gap at audit time.</p>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>important</div><div class="admonitionContent_BuS1"><p>If your "observability" stores everything in one tier with one retention setting, you have made a governance decision by accident. The raw-prompt store and the audit log have <strong>opposite</strong> requirements <em>short + erasable vs long + immutable</em> and conflating them fails both.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-four-consumers-four-different-questions">3. Four consumers, four different questions<a href="https://jitendersharma.dev/insights/ai-observability-in-enterprise#3-four-consumers-four-different-questions" class="hash-link" aria-label="Direct link to 3. Four consumers, four different questions" title="Direct link to 3. Four consumers, four different questions" translate="no">​</a></h2>
<p>Storage isn't the point; the questions you can answer are. Each consumer reads a different tier.</p>
<ul>
<li class=""><strong>Dashboards</strong> (logs + metrics + traces) - <em>what is the system doing right now</em>? The operational view.</li>
<li class=""><strong>SLO + burn-rate alerts</strong> (metrics) - <em>are we spending our error budget too fast?</em> Pages a human before users feel it.</li>
<li class=""><strong>Drift detector</strong> (traces + raw prompts + embeddings) - <em>is the input distribution moving away from what we tested - and from RAG, is the retrieval corpus drifting too</em>? This is the AI-specific one; model quality erodes silently as the world changes.</li>
<li class=""><strong>Regulatory replay</strong> (audit log) - <em>can we reconstruct exactly what the system did, months later, for someone who wasn't there?</em> The immutable trail.</li>
</ul>
<!-- -->
<p>The split is the insight: <strong>operational health, model-quality erosion, and provable accountability are three different jobs.</strong> A latency dashboard tells you nothing about drift. A drift detector can't satisfy an auditor. You need all three, fed by the right tiers.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-this-is-an-architecture-problem-not-a-tooling-purchase">Why this is an architecture problem, not a tooling purchase<a href="https://jitendersharma.dev/insights/ai-observability-in-enterprise#why-this-is-an-architecture-problem-not-a-tooling-purchase" class="hash-link" aria-label="Direct link to Why this is an architecture problem, not a tooling purchase" title="Direct link to Why this is an architecture problem, not a tooling purchase" translate="no">​</a></h2>
<p>You can buy dashboard. You cannot buy the <strong>decision</strong> in this diagram.</p>
<ul>
<li class=""><strong>What to sample</strong> (trace, raw payloads) vs <strong>always capture</strong> (metrics): a latency/cost trade off.</li>
<li class=""><strong>where redaction happens</strong> (the collector, before persistence): a privacy boundary.</li>
<li class=""><strong>Which tier is immutable</strong> (the audit log): a compliance commitment you design in, not bolt on.</li>
<li class=""><strong>What "healthy" means</strong>  (the SLOs and drift thresholds): domain knowledge no tool ships with.</li>
</ul>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>This is the same thesis as <a class="" href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem">"Hallucination" is a design problem:</a> reliability lives in the <strong>system around the model.</strong> Observability is how you <em>measure</em> that reliability: groundedness, unsupported-claim rate and drift become metrics you log the way you'd log latency.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-precise-position">The precise position<a href="https://jitendersharma.dev/insights/ai-observability-in-enterprise#the-precise-position" class="hash-link" aria-label="Direct link to The precise position" title="Direct link to The precise position" translate="no">​</a></h2>
<p>Most teams stand up a metrics dashboard, call it "AI observability," and move on. That covers exactly one of the four consumer above and not the two that regulators and quality erosion will eventually make you care about.</p>
<p>The architecture that actually holds up captures <strong>five signals with deliberate retention</strong>, redacts <strong>at the collector</strong> and feeds <strong>four distinct consumers</strong>: operational, budget, drift and audit. The diagram isn't decoration; it's the set of decisions you will be asked to defend.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p>"Monitor your AI" is a slogan. <strong>Capture five signals, route them to tiers with deliberate retention, and feed four consumers, dashboards, SLO alerts, drift detection, and regulatory replay.</strong> That's the system, everything else is a dashboard pretending to be a strategy.</p></div></div>]]></content>
        <author>
            <name>Jitender Sharma</name>
            <uri>https://jitendersharma.dev</uri>
        </author>
        <category label="Platforms & Engineering" term="Platforms & Engineering"/>
        <category label="Architecture" term="Architecture"/>
        <category label="G.A.I.N" term="G.A.I.N"/>
        <category label="Observability" term="Observability"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[Hallucinations Is a System Design Problem]]></title>
        <id>https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem</id>
        <link href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem"/>
        <updated>2026-06-16T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[Hallucination is not the model failing. It is the model succeeding at the wrong objective in a system that never gave it the right one.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="Hallucinations Is a System Design Problem, Not a Model Problem" src="https://jitendersharma.dev/assets/images/hallucinations-69a7823f450757ca5783b0de28369679.png" width="1254" height="705" class="img_ev3q"></p>
<p>Every time a model invents a citation, the conversation jumps to "which model hallucinates less?". That's the wrong question. The model did exactly what it was built to do. Everyone's focused on <strong>picking the model that hallucinates least</strong>.</p>
<p>The thing that will actually decide whether your AI system is trustworthy is <strong>the architecture you wrap around the model</strong> – grounding, retrieval, validation, and an explicit path to "I don't know".</p>
<p>A hallucination isn't a bug the next checkpoint will patch. It's the <strong>expected behavior</strong> of a frozen, probabilistic next-token predictor asked a question it has no grounded answer for. Treating it as a model defect means you keep waiting for a fix that isn't coming. Treating it as a design problem means you can actually solve it today.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span> THE CLAIM</div><div class="admonitionContent_BuS1"><p>Hallucination is not the model failing. It's the model succeeding at the wrong objective – fluent continuation – in a system that never gave it the right one: grounded truth.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-the-model-was-never-going-to-save-you">Why the model was never going to save you<a href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem#why-the-model-was-never-going-to-save-you" class="hash-link" aria-label="Direct link to Why the model was never going to save you" title="Direct link to Why the model was never going to save you" translate="no">​</a></h2>
<p>A trained model is a <strong>frozen function</strong>: <code>f(tokens) -&gt; next-token probabilities</code>. It has no live knowledge, no source of truth, and no built-in concept of “I don't actually know this”. Three properties make hallucinations structural, not accidental:</p>
<table><thead><tr><th>Property of the model</th><th>Consequence</th></tr></thead><tbody><tr><td><strong>Frozen at training time</strong></td><td>No access to fresh, private or post-cutoff facts - it fills gaps from priors</td></tr><tr><td><strong>Optimized for fluency, not truth</strong></td><td>The objective was plausible next token, never verified fact</td></tr><tr><td><strong>No native abstention</strong></td><td>“Confidently wrong” scores the same as confident and right unless the system checks</td></tr></tbody></table>
<p>So when you ask something outside what it learned, it doesn't error out - it produces the most statistically plausible continuation. That continuation is often fluent, well-formatted, and wrong. The model isn't broken. It's doing precisely what next-token prediction does.</p>
<p>The model invents a citation because inventing a plausible continuation is the only thing it was ever built to do - truth was never in its objective, so it has to be in your architecture.</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>A bigger or newer model shifts where the cliff is, not that there is a cliff. You're buying a lower hallucination rate, not a guarantee. Rates don't survive contact with a regulator, an auditor, or a customer who was given a fake policy number.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="why-this-is-a-design-problem-the-enterprise-lens">Why this is a design problem (the enterprise lens)<a href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem#why-this-is-a-design-problem-the-enterprise-lens" class="hash-link" aria-label="Direct link to Why this is a design problem (the enterprise lens)" title="Direct link to Why this is a design problem (the enterprise lens)" translate="no">​</a></h2>
<p>If the model can't be the source of truth, <strong>the system has to be</strong>. That reframes hallucinations from "model quality" to "system design" - and design is something you control.</p>
<ul>
<li class=""><strong>Grounding is an architecture choice, not a model feature</strong>. RAG exists precisely because the model's knowledge is frozen. Inject the right context at runtime and the model is <em>continuing from facts</em> instead of <em>inventing from priors</em>. No retrieval layer = you've delegated truth to a frozen function and hoped.</li>
<li class=""><strong>Validation lives outside the model</strong>. Guardrails, schema/grounding checks, and citation verifications sit <em>around</em> the model - you can't patch behaviors inside frozen weights in real time. The system decides what's allowed to reach the user, not the model.</li>
<li class=""><strong>"I don't know" must be an engineered path</strong>. Models don't volunteer abstention. Confidence thresholds, retrieval-coverage checks, and explicit fallbacks are what turn a confident guess into an honest "I can't answer that from sources I have."</li>
<li class=""><strong>Cost and governance ride on this</strong>. An ungrounded answer in a bank, a hospital, or a legal workflow isn't a quality blip - it's liability. Design decides whether a wrong answer is impossible to surface or merely cheap to retry.</li>
</ul>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>important</div><div class="admonitionContent_BuS1"><p>The <strong>intelligence</strong> is in the model. The <strong>truth</strong> is in the system. If your architecture has no component that owns "is this actually true and supported?", then nothing does - and the model will happily fill the silence.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="non-determinism-is-not-hallucination">Non-determinism is not hallucination<a href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem#non-determinism-is-not-hallucination" class="hash-link" aria-label="Direct link to Non-determinism is not hallucination" title="Direct link to Non-determinism is not hallucination" translate="no">​</a></h2>
<p>This is the objection we hear most, and it's the strongest argument for the design framing - not against it. But it actually bundles two different things together.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="different-answers--hallucinations">Different answers ≠ Hallucinations<a href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem#different-answers--hallucinations" class="hash-link" aria-label="Direct link to Different answers ≠ Hallucinations" title="Direct link to Different answers ≠ Hallucinations" translate="no">​</a></h3>
<table><thead><tr><th></th><th><strong>Non-determinism</strong></th><th><strong>Hallucination</strong></th></tr></thead><tbody><tr><td>What it is</td><td>Different wording for the same question</td><td>A <em>confident false claim</em></td></tr><tr><td>Cause</td><td><strong>Sampling</strong> (temperature, top-p) picks among probable tokens</td><td>No grounded fact, so it continues from priors</td></tr><tr><td>Your control</td><td>Yes - set <code>temperature=0</code></td><td>Only via grounding + verification</td></tr></tbody></table>
<p>The model never stores "an answer". Each step it produces a <strong>probability distribution</strong> over the next token, then <em>samples</em> from it. At <code>temperature &gt; 0</code> you are rolling a weighted dice every token - hence different phrasings. Set <code>temperature = 0</code> (greedy decoding) and it becomes <strong>near-deterministic</strong>: same input -&gt; same output.</p>
<br>
<p><code>(near, because floating-point rounding and GPU batching cause tiny variations - an engineering detail, not the core issue.)</code></p>
<br>
<p>So "different answers each time" is a <strong>knob you control</strong>, not proof the model is reliable.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="there-is-no-100-surety--and-thats-the-whole-point">There is no 100% surety – and that’s the whole point<a href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem#there-is-no-100-surety--and-thats-the-whole-point" class="hash-link" aria-label="Direct link to There is no 100% surety – and that’s the whole point" title="Direct link to There is no 100% surety – and that’s the whole point" translate="no">​</a></h3>
<p>Grounding does not guarantee a correct answer. It shifts the probability mass. Without context, the most-probable continuation comes from <em>fuzzy</em> training priors (high risk). With the right context in the prompt, the most-probable continuation becomes <em>"paraphrase what's in front of me" (much lower risk)</em>. You move from maybe ~70% to 95% - <strong>never to 100%</strong>.</p>
<br>
<p>So where does the surety come from? <strong>Not the model - a separate verifier</strong>. The thing that generates the answer must not be the thing that decides it's trustworthy. A grounded model gives you a good draft - 95%; design decides what happens to the other 5%, whether it silently reaches your user or gets caught and blocked.</p>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>You can't make a frozen, sampling-based function promise truth - so reliability <strong>has to</strong> be engineered around it. The model's lack of a guarantee is the reason design exists, not a reason to wait for a better model.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-designing-for-it-actually-looks-like">What “designing for it” actually looks like<a href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem#what-designing-for-it-actually-looks-like" class="hash-link" aria-label="Direct link to What “designing for it” actually looks like" title="Direct link to What “designing for it” actually looks like" translate="no">​</a></h2>
<p>Those four principles become one concrete pipeline. You don't eliminate hallucinations by hoping - you <strong>box it in</strong> with layers, each on catching what the last let through.</p>
<!-- -->
<ul>
<li class=""><strong>Retrieve before you generate</strong> - give the model facts to continue from, not a blank page.</li>
<li class=""><strong>Constrain the output</strong> - structural formats, required citations, schema validation.</li>
<li class=""><strong>Verify against the source</strong> - does everything claim trace back to retrieved evidence?</li>
<li class=""><strong>Make abstention first-class</strong> - "no grounded answer" is a valid, designed outcome, not a failure.</li>
<li class=""><strong>Observe in production</strong> - log groundedness and unsupported claim rates the way you'd log latency, Hallucination is a measurable system metric, not a vibe.</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="how-to-actually-build-the-verifier">How to actually build the verifier<a href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem#how-to-actually-build-the-verifier" class="hash-link" aria-label="Direct link to How to actually build the verifier" title="Direct link to How to actually build the verifier" translate="no">​</a></h2>
<p>"Add a verifier" is easy to say. The trap is building one that just re-asks the same model "are you sure?" - it'll rationalize its own output. A good verifier follows two rules and one ordering.</p>
<p><strong>Rule 1 - independent from the generator.</strong> The thing that <em>checks</em> the answer must not be the thing that wrote it. Use deterministic code, a retrieval system, or a <em>separate</em> model call that sees only the claim + the source - never the original reasoning.</p>
<p><strong>Rule 2 - verify atomic claims, not paragraph</strong> "Mostly right" hides one wrong clause. Decompose the answer into individual facts and check each one against evidence.</p>
<p><strong>The ordering - cheapest, most deterministic checks first, expensive models last, on the reside only:</strong></p>
<!-- -->
<table><thead><tr><th><strong>Layer</strong></th><th><strong>Mechanism</strong></th><th><strong>Catches</strong></th><th><strong>Cost</strong></th></tr></thead><tbody><tr><td><strong>1. Structural</strong></td><td>JSON schema, constrained decoding</td><td>No citations, malformed output</td><td>~Free</td></tr><tr><td><strong>2. Deterministic Facts</strong></td><td>Exact/fuzzy match against source</td><td>Invented numbers, IDs, dates, quotes</td><td>~Free</td></tr><tr><td><strong>3. Grounding (NLI)</strong></td><td>Small entailment model per claim</td><td>Unsupported or contradicted claims</td><td>Cheap</td></tr><tr><td><strong>4. LLM-as-judge</strong></td><td><em>Separate</em> model</td><td>Nuanced cases the rest can't settle</td><td>Expensive</td></tr></tbody></table>
<p>The verifier doesn't make the system perfect. It converts a <em>silent, confident, wrong answer</em> into a caught-and-blocked one - turning an unbounded risk into a <strong>measurable error rate with a fallback</strong>. That conversation is exactly what you can put in front of an auditor.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="where-i-actually-land">Where I actually land<a href="https://jitendersharma.dev/insights/hallucinations-is-a-system-design-problem-not-model-problem#where-i-actually-land" class="hash-link" aria-label="Direct link to Where I actually land" title="Direct link to Where I actually land" translate="no">​</a></h2>
<p>My point is: I'm not saying models don't matter, or that one model is as good as another. Picking a stronger model genuinely lowers the baseline rate.</p>
<br>
<p>I am saying: a better model <strong>reduces</strong> hallucinations; only better <strong>design</strong> lets you <strong>bound and govern</strong> it. If your reliability strategy is "wait for the next model," you've outsourced your most important architectural decision to someone else's release schedule - and you still won't be able to promise an auditor anything.</p>
<br>
<p>Stop asking "which model hallucinates the least?" Start asking <strong>"what in the system owns the truth, and what happens when it doesn't have an answer?"</strong></p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p>Hallucination is the model doing its job inside a system that forgot to do its own. Engineer grounding, validation, and abstention around the frozen model - that's where reliability is actually built.</p></div></div>]]></content>
        <author>
            <name>Jitender Sharma</name>
            <uri>https://jitendersharma.dev</uri>
        </author>
        <category label="AI & Intelligence" term="AI & Intelligence"/>
        <category label="Point of View" term="Point of View"/>
        <category label="LLM" term="LLM"/>
        <category label="Hallucinations" term="Hallucinations"/>
    </entry>
    <entry>
        <title type="html"><![CDATA[How LLM Works Under the Hood]]></title>
        <id>https://jitendersharma.dev/insights/how-llm-works-under-the-hood</id>
        <link href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood"/>
        <updated>2026-06-09T00:00:00.000Z</updated>
        <summary type="html"><![CDATA[A 20,000-ft view of the LLM lifecycle and why understanding the four stages matters for enterprise architecture.]]></summary>
        <content type="html"><![CDATA[<p><img decoding="async" loading="lazy" alt="How LLM Works Under the Hood" src="https://jitendersharma.dev/assets/images/transformer-caadd220223ee8d122785364571f04c8.png" width="1030" height="579" class="img_ev3q"></p>
<p>Most discussions about LLMs focus on prompts, tools, and frameworks. However, few explain how the model actually works under the hood and why that matters when building real systems.</p>
<p>This is a 20,000-ft view of the LLM lifecycle in four stages.</p>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-big-picture-one-model-four-stages">The big picture: one model, four stages.<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#the-big-picture-one-model-four-stages" class="hash-link" aria-label="Direct link to The big picture: one model, four stages." title="Direct link to The big picture: one model, four stages." translate="no">​</a></h2>
<p>A model's whole life is just four stages. The shape and vocabulary are fixed first; training only fills in the values, and inference is read-only and never learns.</p>
<!-- -->
<br>
<table><thead><tr><th>Stage</th><th>What happens</th><th>Key ideas</th></tr></thead><tbody><tr><td>Before</td><td>Decide the blueprint</td><td>Architecture dials set the shape, tokenizer builds the vocabulary, and parameter count is fixed.</td></tr><tr><td>During</td><td>Fill in the values</td><td>Random weights become meaningful through training: a four-step loop run millions or trillions of times.</td></tr><tr><td>Alignment</td><td>Make it helpful</td><td>Show good examples (SFT) and teach which answers are better (RLHF/DPO).</td></tr><tr><td>After</td><td>Run it, read-only</td><td>Weights are frozen (no learning); inference traverses the model geometry one token at a time.</td></tr></tbody></table>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p>Shape + vocabulary are fixed first. Training only fills the values. Inference never learns.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="stage-1---before-training">Stage 1 - Before training<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#stage-1---before-training" class="hash-link" aria-label="Direct link to Stage 1 - Before training" title="Direct link to Stage 1 - Before training" translate="no">​</a></h2>
<p>Two human decisions are baked in before any gradient is computed.</p>
<ul>
<li class=""><strong>Architecture dials</strong> - hidden size, layers, heads, FFN width, vocab size.</li>
<li class=""><strong>Tokenizer vocabulary</strong> - the integer alphabet the model reads and writes.</li>
</ul>
<p>A "7B" model is 7B because of these dials. Training never grows it, and most parameters live in the FFN, not attention.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-architecture-dials">The Architecture dials<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#the-architecture-dials" class="hash-link" aria-label="Direct link to The Architecture dials" title="Direct link to The Architecture dials" translate="no">​</a></h3>
<table><thead><tr><th>Hyperparameter</th><th>Example</th><th>Description</th></tr></thead><tbody><tr><td>hidden_size(D)</td><td>4096</td><td>How much "thinking space" the model has for each word or idea at a given moment.</td></tr><tr><td>num_layers(L)</td><td>32</td><td>How many rounds of refinement - 32 editors in a row.</td></tr><tr><td>num_heads(H)</td><td>32</td><td>A panel of specialists, each spotting a different pattern.</td></tr><tr><td>head_dim(D_h)</td><td>128</td><td>The size of each specialist's notebook.</td></tr><tr><td>ffn_hidden(D_ff)</td><td>16,384</td><td>The knowledge bank, where most facts are stored (~4*D).</td></tr><tr><td>vocab_size(V)</td><td>32000</td><td>The size of the model's dictionary, the building blocks it uses to read and write language.</td></tr></tbody></table>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p>The model is fully sized and described before it sees a single token.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="stage-2---during-training">Stage 2 - During training<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#stage-2---during-training" class="hash-link" aria-label="Direct link to Stage 2 - During training" title="Direct link to Stage 2 - During training" translate="no">​</a></h2>
<p>Learning is one four-step loop, repeated hundreds of thousands to millions of times.</p>
<!-- -->
<ol>
<li class=""><strong>Forward Pass</strong> - Predicts what comes next in a sequence, based on previous tokens.</li>
<li class=""><strong>Loss</strong> - How wrong was our prediction?</li>
<li class=""><strong>Backpropagation</strong> - Calculate how much, and how each weight contributed to the error.</li>
<li class=""><strong>Optimizer step</strong> - Update every weight, slightly adjusting each weigh.</li>
</ol>
<div class="theme-admonition theme-admonition-note admonition_xJq3 alert alert--secondary"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M6.3 5.69a.942.942 0 0 1-.28-.7c0-.28.09-.52.28-.7.19-.18.42-.28.7-.28.28 0 .52.09.7.28.18.19.28.42.28.7 0 .28-.09.52-.28.7a1 1 0 0 1-.7.3c-.28 0-.52-.11-.7-.3zM8 7.99c-.02-.25-.11-.48-.31-.69-.2-.19-.42-.3-.69-.31H6c-.27.02-.48.13-.69.31-.2.2-.3.44-.31.69h1v3c.02.27.11.5.31.69.2.2.42.31.69.31h1c.27 0 .48-.11.69-.31.2-.19.3-.42.31-.69H8V7.98v.01zM7 2.3c-3.14 0-5.7 2.54-5.7 5.68 0 3.14 2.56 5.7 5.7 5.7s5.7-2.55 5.7-5.7c0-3.15-2.56-5.69-5.7-5.69v.01zM7 .98c3.86 0 7 3.14 7 7s-3.14 7-7 7-7-3.12-7-7 3.14-7 7-7z"></path></svg></span>note</div><div class="admonitionContent_BuS1"><p>The only thing learned here is the <strong>next-token prediction</strong>: the statistical relationship between tokens given their surrounding context.
Pre-training delivers languages and knowledge; it does not shape behavior (following instructions, being helpful, staying safe). No behavior is learned at this stage: that comes later, in alignment.</p></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="from-random-numbers-to-learned-meaning">From random numbers to learned meaning<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#from-random-numbers-to-learned-meaning" class="hash-link" aria-label="Direct link to From random numbers to learned meaning" title="Direct link to From random numbers to learned meaning" translate="no">​</a></h3>
<table><thead><tr><th>Before training (random)</th><th>After training (meaning)</th></tr></thead><tbody><tr><td>Every weight is a random number</td><td>Every weight holds a learned value</td></tr><tr><td>Output is gibberish</td><td>Output is fluent, coherent text</td></tr><tr><td>No grammar, facts, or reasoning</td><td>Grammar, facts, and reasoning emerge</td></tr><tr><td>Structure exists, meaning doesn't</td><td>Same structure: now full of meaning</td></tr></tbody></table>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p>Learning is the same four-step loop, running hundreds of thousands to millions of times, turning random numbers into meaning.</p></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-roles-that-emerge-after-training">The roles that emerge after training<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#the-roles-that-emerge-after-training" class="hash-link" aria-label="Direct link to The roles that emerge after training" title="Direct link to The roles that emerge after training" translate="no">​</a></h3>
<p>Components start as random numbers with no predefined purpose. After millions or billions of training steps, gradient descent gradually shapes them into specialized roles, learned through experience, not explicitly designed.</p>
<table><thead><tr><th>Component</th><th>Role it settles into</th></tr></thead><tbody><tr><td>Embeddings</td><td>What tokens mean (lexical meaning)</td></tr><tr><td>Attention</td><td>How tokens relate: routes relevant context</td></tr><tr><td>FFNs</td><td>Transformation / "thinking". Most parameters and reasoning</td></tr><tr><td>LayerNorm</td><td>Keep signals stable and usable</td></tr><tr><td>Depth (layers)</td><td>Progressive refinement of understanding</td></tr></tbody></table>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p>No one designs these roles; training gradually turns them into specialist roles through learning rather than design.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="stage-3---alignment">Stage 3 - Alignment<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#stage-3---alignment" class="hash-link" aria-label="Direct link to Stage 3 - Alignment" title="Direct link to Stage 3 - Alignment" translate="no">​</a></h2>
<p>A raw pre-trained model is a brilliant autocomplete, not yet a helpful assistant. Alignment is a thin, cheap layer on top of pre-training that shapes behavior.</p>
<table><thead><tr><th></th><th>Main training</th><th>Polish (alignment)</th></tr></thead><tbody><tr><td>Data</td><td>Trillions of words</td><td>Thousands to millions of examples</td></tr><tr><td>Length (cost)</td><td>Weeks/months, huge cost</td><td>Short, cheap</td></tr><tr><td>What it does</td><td>Teaches knowledge</td><td>Shapes behavior</td></tr></tbody></table>
<ul>
<li class=""><strong>SFT</strong> - show it good (prompt, response) examples.</li>
<li class=""><strong>RLHF/DPO</strong> - teach it which answer is better.</li>
</ul>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p>Alignment turns a raw model into a helpful assistant: it shapes behavior; it doesn't add new knowledge.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="stage-4---after-training">Stage 4 - After training<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#stage-4---after-training" class="hash-link" aria-label="Direct link to Stage 4 - After training" title="Direct link to Stage 4 - After training" translate="no">​</a></h2>
<p>Once training stops, <strong>weights are frozen</strong>: no learning, no gradients. The model is a fixed function <code>f(tokens) -&gt; next token probabilities</code>.</p>
<!-- -->
<p>During inference, the model has <strong>no memory</strong> of what was asked or answered before: each request starts fresh.</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p>Training builds the geometry. Inference just navigates it one token at a time.</p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="the-mental-model-most-people-get-wrong">The Mental Model most people get wrong<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#the-mental-model-most-people-get-wrong" class="hash-link" aria-label="Direct link to The Mental Model most people get wrong" title="Direct link to The Mental Model most people get wrong" translate="no">​</a></h2>
<ul>
<li class="">LLM ≠ continuously learning systems</li>
<li class="">LLM ≠ dynamic knowledge base</li>
<li class="">LLM ≠ autonomous agent</li>
</ul>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="what-this-means-for-enterprise-systems">What this means for Enterprise Systems<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#what-this-means-for-enterprise-systems" class="hash-link" aria-label="Direct link to What this means for Enterprise Systems" title="Direct link to What this means for Enterprise Systems" translate="no">​</a></h2>
<p>Understanding how LLMs actually work leads to a critical shift in how we design AI systems. The model itself is not "the system". It's a <strong>fixed component inside a larger architecture</strong>.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="1-why-rag-is-required">1. Why RAG is required<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#1-why-rag-is-required" class="hash-link" aria-label="Direct link to 1. Why RAG is required" title="Direct link to 1. Why RAG is required" translate="no">​</a></h3>
<p>LLMs do not have access to fresh and private data. Their knowledge is fixed at training time.</p>
<p><strong>To make them useful in enterprise:</strong>
<strong>To make them useful in enterprise:</strong></p>
<ul>
<li class="">Connect them to internal data sources</li>
<li class="">Inject context at runtime</li>
</ul>
<p>This is why <strong>Retrieval Augmentation (RAG)</strong> becomes a foundational pattern.</p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="2-why-agentsorchestration-are-external">2. Why agents/orchestration are external<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#2-why-agentsorchestration-are-external" class="hash-link" aria-label="Direct link to 2. Why agents/orchestration are external" title="Direct link to 2. Why agents/orchestration are external" translate="no">​</a></h3>
<p>LLMs are:</p>
<ul>
<li class="">Stateless</li>
<li class="">Reactive</li>
<li class="">Single-step predictors</li>
</ul>
<p>They cannot:</p>
<ul>
<li class="">Execute workflows</li>
<li class="">Maintain long-running state</li>
<li class="">Coordinate systems</li>
</ul>
<p>This is why <strong>agentic systems and orchestration layers exist outside the model</strong></p>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>important</div><div class="admonitionContent_BuS1"><p>The intelligence is in the model and the <strong>control</strong> is in the system design.</p></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="3-why-governance-is-outside-the-model">3. Why governance is outside the model<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#3-why-governance-is-outside-the-model" class="hash-link" aria-label="Direct link to 3. Why governance is outside the model" title="Direct link to 3. Why governance is outside the model" translate="no">​</a></h3>
<p>You cannot "patch" behavior inside a trained model in real time. Enterprise systems must implement:</p>
<ul>
<li class="">Guardrails</li>
<li class="">Validation layers</li>
<li class="">Monitoring and evaluation</li>
<li class="">Policy enforcement</li>
</ul>
<p>All of these sit <strong>around the model, not inside it</strong></p>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="4-why-inference-cost-dominates">4. Why inference cost dominates<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#4-why-inference-cost-dominates" class="hash-link" aria-label="Direct link to 4. Why inference cost dominates" title="Direct link to 4. Why inference cost dominates" translate="no">​</a></h3>
<p>Training is:</p>
<ul>
<li class="">One-time</li>
<li class="">Expensive but amortized</li>
</ul>
<p>Inference is:
Inference is:</p>
<ul>
<li class="">Continuous</li>
<li class="">Scales with usage</li>
</ul>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>important</div><div class="admonitionContent_BuS1"><p>For enterprise systems:
Cost = traffic * tokens * latency requirements</p></div></div>
<h3 class="anchor anchorTargetStickyNavbar_Vzrq" id="5-why-scale-and-cost-must-be-designed-upfront">5. Why scale and cost must be designed upfront<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#5-why-scale-and-cost-must-be-designed-upfront" class="hash-link" aria-label="Direct link to 5. Why scale and cost must be designed upfront" title="Direct link to 5. Why scale and cost must be designed upfront" translate="no">​</a></h3>
<p>Because LLMs don't learn in production, every interaction requires:</p>
<ul>
<li class="">Full inference execution</li>
<li class="">Token processing (input+output)</li>
<li class="">External system calls (RAG /agents)</li>
</ul>
<p>This means:</p>
<ul>
<li class="">Cost scales with usage, not with training</li>
<li class="">Latency compounds across system layers</li>
<li class="">Poor design = exponential cost growth</li>
</ul>
<p>In real systems, if not handled correctly:</p>
<ul>
<li class="">RAG increases token usage</li>
<li class="">Agents introduce multiple-step execution</li>
<li class="">Orchestration adds round trips</li>
</ul>
<div class="theme-admonition theme-admonition-important admonition_xJq3 alert alert--info"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 14 16"><path fill-rule="evenodd" d="M7 2.3c3.14 0 5.7 2.56 5.7 5.7s-2.56 5.7-5.7 5.7A5.71 5.71 0 0 1 1.3 8c0-3.14 2.56-5.7 5.7-5.7zM7 1C3.14 1 0 4.14 0 8s3.14 7 7 7 7-3.14 7-7-3.14-7-7-7zm1 3H6v5h2V4zm0 6H6v2h2v-2z"></path></svg></span>important</div><div class="admonitionContent_BuS1"><p>Training is a <strong>one-off capital cost</strong>; inference is the <strong>ongoing operational cost</strong>. Also, without careful design, AI systems become <strong>unpredictable and expensive at scale</strong></p></div></div>
<h2 class="anchor anchorTargetStickyNavbar_Vzrq" id="final-takeaway">Final Takeaway<a href="https://jitendersharma.dev/insights/how-llm-works-under-the-hood#final-takeaway" class="hash-link" aria-label="Direct link to Final Takeaway" title="Direct link to Final Takeaway" translate="no">​</a></h2>
<p>The model provides intelligence and the system provides control.</p>
<p>Modern AI architecture is not “LLM design” It is “system design around a frozen model”</p>
<p>Traffic × Tokens × Latency</p>
<div class="theme-admonition theme-admonition-tip admonition_xJq3 alert alert--success"><div class="admonitionHeading_Gvgb"><span class="admonitionIcon_Rf37"><svg viewBox="0 0 12 16"><path fill-rule="evenodd" d="M6.5 0C3.48 0 1 2.19 1 5c0 .92.55 2.25 1 3 1.34 2.25 1.78 2.78 2 4v1h5v-1c.22-1.22.66-1.75 2-4 .45-.75 1-2.08 1-3 0-2.81-2.48-5-5.5-5zm3.64 7.48c-.25.44-.47.8-.67 1.11-.86 1.41-1.25 2.06-1.45 3.23-.02.05-.02.11-.02.17H5c0-.06 0-.13-.02-.17-.2-1.17-.59-1.83-1.45-3.23-.2-.31-.42-.67-.67-1.11C2.44 6.78 2 5.65 2 5c0-2.2 2.02-4 4.5-4 1.22 0 2.36.42 3.22 1.19C10.55 2.94 11 3.94 11 5c0 .66-.44 1.78-.86 2.48zM4 14h5c-.23 1.14-1.3 2-2.5 2s-2.27-.86-2.5-2z"></path></svg></span>TAKEAWAY</div><div class="admonitionContent_BuS1"><p>Treat the LLM as frozen dependency; engineer everything else around it.</p></div></div>]]></content>
        <author>
            <name>Jitender Sharma</name>
            <uri>https://jitendersharma.dev</uri>
        </author>
        <category label="Strategy & Architecture" term="Strategy & Architecture"/>
        <category label="Explainer" term="Explainer"/>
        <category label="LLM" term="LLM"/>
    </entry>
</feed>