Eval Plane ④: Reasoning

The Reasoning plane is where the model plans, synthesizes, and decides next steps. Context can be perfect and reasoning can still fail.

THE CLAIM

Reasoning eval separates faithfulness (stays on evidence) from correctness (draws the right conclusion).

What to evaluate

Dimension	Automated	Judge	Human
Faithfulness to context	Claim ↔ chunk overlap	✓	High-risk
Logical consistency	—	✓	✓
Tool selection	Match expected tool	✓	Edge cases
Uncertainty expression	—	✓	✓
Hallucination	Citation required	✓	✓

Scenario	Fixture	Expected
Multi-hop	Two docs needed	Both used in rationale
Distractor	Similar wrong doc in pack	Ignored
Abstain	Thin evidence	"Cannot determine from sources"
Tool choice	"Check balance"	`get_balance` not `transfer`

Use chain-of-thought in judge internally; store rationale JSON only.

model_id, prompt_version, reasoning_trace (if exposed), planned_tools, citations