LLM Bill Triage — Cost-per-Successful-Task Audit
A one-page forensic read of your AI agent bill. $299 fixed fee. Delivered in 5 business days.
$299Fixed Fee
5 daysDelivery
60-80%Avg. Recoverable
What you get. Send me 7 days of your usage data (CSV from your provider, or a trace dump from LangSmith / Helicone / Langfuse / vLLM logs). I run the same 4-line shell check from the article plus a 32-rule forensic engine against it, then write back a one-page report that:
- Names the dominant cost-leak shape — retry storm, streaming-abort, agent-of-agents recursion, model-routing overkill, or context-stuffing — with the specific evidence from your top-50 tasks.
- Ranks 3-5 specific fixes for your workload. Not a 20-item generic checklist. Concrete code/config changes ordered by ROI.
- Quotes a new cost-per-successful-task number you should target, with the rough order of magnitude of savings.
What the audit catches
Shape 1 — Recursive self-correction loops
The agent calls a tool, the tool returns ambiguous output, the agent calls it again to "verify." Three to seven paid calls per intended one. Mean calls-per-task above 2.5 is the tell.
Shape 2 — Streaming-abort-unhonored retries
Streaming connection drops at 8,000 tokens. Client retries. Both streams billed. stream_options.include_usage is off by default on most inference clients. 18-32% of total spend on streaming-heavy workloads.
Shape 3 — Agent-of-agents recursion
Manager dispatches sub-agents. Each sub-agent's prompt re-serializes the manager's full context. Token count grows super-linearly with depth. 717x worst case documented in 2026 (Predict / Medium).
Shape 4 — Model-routing overkill
Frontier model set globally on a graph. Three nodes that don't need it. Per-node routing typically saves 5-10x on the highest-frequency nodes with eval held within delta.
Shape 5 — Context-stuffing
State-graph context serialized into every node's prompt. 1,500 tokens of context referenced on 13% of calls. Pure prompt bloat, paid in full every time.
How the engagement works
- You send the data: 7 days of usage (CSV or trace dump). Anything I receive stays in a private working dir and is deleted on delivery.
- I run the audit: 4-line shell check + 32-rule engine + manual read of the top 10 anomalies.
- I write the one-page report: dominant shape, ranked fixes, target cost-per-successful-task.
- You ship the fixes (or hire me to ship them, separately quoted).
What I will and won't do
- Will: read the data, write the report, hand it back in plain text. No call, no upsell, no follow-up sequence.
- Won't: share your data, train on it, or use it as marketing copy. If you want a sanitized example of a Shape 3 leak, I'll use one of the three case studies from the article — not yours.
Order the audit
See the full LLM Bill Triage page
Or drop a sanitized snippet in the comments on the dev.to article and I'll annotate the top-3 leaks for free. Same engine, less narrative.
Milo Antaeus. One operator, one forensic engine, fixed-fee engagements. No SaaS, no subscription, no upsell.