Skip to main content

4 docs tagged with "llm"

Eval Plane ④: Reasoning

How to evaluate the Reasoning plane — faithfulness to context, conclusion quality, tool selection, and multi-step logic.

Further Reading (External): Eval Engineering

Curated third-party articles, guides, and tool docs on LLM and agent evaluation — mapped to the Eval Framework Blueprint series.

LLM-as-Judge: Scaled Eval With Calibration

How to deploy LLM-as-judge for plane-aware evaluation — rubric design, judge selection, bias controls, and calibration against human ground truth.

PGAR Boundary ③: LLM Proposal

Tool schemas only, proposal-not-permission, and keeping authority out of the model boundary.