Skip to main content

AI Observability In Enterprise

· 7 min read
Jitender Sharma
Software Architect

AI Observability In Enterprise

Everyone says "monitor your AI in production". Almost nobody draws the system that does it. "Add Observability" is a slogan until you can say exactly what gets captured, where it lands, how long it lives, and who reads it.

This is an architecture breakdown - capture in the request path, fan-out into purpose-built storage tiers, and four very different consumers reading off them. The headline: AI observability isn't one thing. It's five signals with five retention policies feeding four jobs, and the regulator-facing ones look nothing like the dashboard-facing ones.

THE CLAIM

AI observability is not "a dashboard". It's a capture-and-retention architecture: each signal (logs, metrics, traces, raw prompts, audit records) has a different consumer, a different retention window, and a different blast radius if you get it wrong.

Hallucinations Is a System Design Problem, Not a Model Problem

· 8 min read
Jitender Sharma
Software Architect

Hallucinations Is a System Design Problem, Not a Model Problem

Every time a model invents a citation, the conversation jumps to "which model hallucinates less?". That's the wrong question. The model did exactly what it was built to do. Everyone's focused on picking the model that hallucinates least.

The thing that will actually decide whether your AI system is trustworthy is the architecture you wrap around the model – grounding, retrieval, validation, and an explicit path to "I don't know".

A hallucination isn't a bug the next checkpoint will patch. It's the expected behavior of a frozen, probabilistic next-token predictor asked a question it has no grounded answer for. Treating it as a model defect means you keep waiting for a fix that isn't coming. Treating it as a design problem means you can actually solve it today.

THE CLAIM

Hallucination is not the model failing. It's the model succeeding at the wrong objective – fluent continuation – in a system that never gave it the right one: grounded truth.

How LLM Works Under the Hood

· 7 min read
Jitender Sharma
Software Architect

How LLM Works Under the Hood

Most discussions about LLMs focus on prompts, tools, and frameworks. However, few explain how the model actually works under the hood and why that matters when building real systems.

This is a 20,000-ft view of the LLM lifecycle in four stages.

The big picture: one model, four stages.

A model's whole life is just four stages. The shape and vocabulary are fixed first; training only fills in the values, and inference is read-only and never learns.


StageWhat happensKey ideas
BeforeDecide the blueprintArchitecture dials set the shape, tokenizer builds the vocabulary, and parameter count is fixed.
DuringFill in the valuesRandom weights become meaningful through training: a four-step loop run millions or trillions of times.
AlignmentMake it helpfulShow good examples (SFT) and teach which answers are better (RLHF/DPO).
AfterRun it, read-onlyWeights are frozen (no learning); inference traverses the model geometry one token at a time.
TAKEAWAY

Shape + vocabulary are fixed first. Training only fills the values. Inference never learns.