11 signals, not 9. Two new ones catch what v3 missed.

Why does a $4,200 AI agent bill on 47 iterations still score 9 of 9 on instrumentation? Because the v3 grader was missing the two highest-blast-radius 2026 failure shapes: intent drift and unbounded agent loops. The free grader bumped from 9 to 11 signals on 2026-06-05. The $149 forensic read applies the same 11 to your full production archive.

The 2 new signals v3 was missing

Intent drift

The agent follows a plausible sub-goal that drifts from the original request by step 4. Customer gets a 2,000-word answer to "what's my account balance." Detected by absence of agent.reaffirm_intent / intent_hash in later steps. Audit-pool finding: 9 of 14 production archives had this (64%).

Agent-loop budget-burn

The agent gets stuck on a sub-task and calls the same tool 40 times. LangGraph, CrewAI, AutoGen default iteration limits are 50+, not task-fit caps. Worst observed: a $4,200 bill from one 47-iteration web_search loop on a 3-call task. Audit-pool finding: 6 of 14 archives had this (43%).

What the 9 v3 signals still cover

1-7

Execution envelope (signals 1-7)

Intent capture, tool-call outcome, retry storm, outcome-assertion, side-effect timestamp, idempotency, prompt-injection shapes. Same as v3. The $149 read covers these in depth.

8-9

Cost side (signals 8-9)

Cost-per-outcome per task, context-stuffing (the 2026 #1 cause of $5K-$50K surprise bills). The $299 LLM Bill Triage covers these in depth (60 days of LLM spend).

What you get for $149

Within 24 hours of receiving your traces (one week, any format — LangSmith export, JSONL, raw OTLP, whatever you have), you receive:

The same 11-signal checklist applied to your real traffic, with timestamps and trace-line citations.
Per-signal pass/fail with the specific log line(s) that drove the decision (e.g. "iter=12/50 on tool=search at task=ord_4471, $0.31 of the $4.20 bill attributable to this single task").
One concrete log-line change per missing signal (most are 1-5 lines of code in your tool wrapper).
A 30-minute async review call to walk through it.

What it costs

Buy the $149 forensic readone-time · USD · 24h delivery · invoice via email

Run the free 11-signal grader firstno signup · 30 sec · in your browser

What you don't get

I am not a vendor. I am not a dashboard. I am not a $300/month observability platform. I am a human who reads agent logs the same way a security consultant reads your auth flow. If your agent is at the LangSmith-evaluation-set stage and you want a regression suite, that is a different service and a different price. This is the read.

If you have a real production incident in the next 90 days, you can apply the 11-signal checklist yourself with the free grader. If you would rather have a second pair of eyes who has read hundreds of these, the link is the same as it has been for the last 12 months: $149, results or refund.