I spent $788 on an AI coding agent in one day. Here's the breakdown.

I left an AI coding agent running for one day. Then I read the invoice. $788. In about 13 hours. I'm posting the real breakdown because I think a lot of people are quietly running up this kind of bill without seeing where it goes — and the fix is boring and effective. One day, 10:21–23:05. 11 sessions, 3,572 API calls across 4 models: Model Calls Output tokens Cache-read tokens Cost Fable 5 ($10/$50) 2,613 1.04M 448M ~$617 Opus 4.8 ($5/$25) 671 769K 248M ~$168 Haiku 4.5 ($1/$5) 242 27K 9M ~$1.70 Sonnet 4.6 ($3/$15) 46 6K 2M ~$0.90 Total 3,572 ~$788 Two numbers reframed how I think about this: The flagship ate $617 by itself — 78% of the bill from one model I'd set as the default for everything. Haiku did 242 real calls for $1.70. A coffee. For work that, honestly, looked a lot like the work I was paying the flagship $0.24/call to do. That's not a 2× or 3× gap. Per call it's a ~360× difference, and I was sending almost everything to the expensive end out of pure default-laziness. Notice 448M + 248M = ~700M cache-read tokens. Agentic coding re-sends a big context every turn; cache reads are billed at ~0.1× input, which is the only reason this was $788 and not several thousand. The flip side: anything that breaks your cache (a changed timestamp, reordered tool list, a proxy that normalizes prompts) silently re-bills at full input price. On this volume, a broken cache is a 10× event. I didn't conclude "stop using good models." I concluded "stop sending everything to them." The pattern: Cheap model by default. Classification, file edits, boilerplate, retrieval — a fast small model handles these fine. Escalate on signal. Hard reasoning, ambiguous specs, failed attempts → bump to the flagship. Cap it. Per-key budgets so a runaway loop trips a limit instead of your card. Watch the cache. Keep the prompt prefix byte-stable so cache reads actually hit. This is exactly what an AI gateway / model router does — it's the layer that lets you express "cheap by default, escalate when it's hard" once, instead of hard-coding a model everywhere. I've since taken the flagship out of the default path, and the same workload now lands in the low tens of dollars a day. While digging into routing I built an open-source, pain-point-organized list of AI gateways — with a reproducible cost benchmark that prices concrete workloads (including a coding scenario with reasoning tokens) across 11 models, computed by a unit-tested script. Plug in your own token mix and see your real number before the invoice does: github.com/cuihuan/awesome-ai-gateway · interactive cost tables If you're running agents daily — have you actually looked at your per-model breakdown? I'd bet most of the bill is one model doing work a cheaper one could.

I spent $788 on an AI coding agent in one day. Here's the breakdown.

Key Takeaways

Related Articles

KeepAI: a local, open-source API hub that lets AI agents use your apps safely

From idea to paying customers: building an AI changelog tool with Angular 21

Iterative Security Audit: 45 Probes, 0 Critical, 6 Regression Tests Kept

Great Stack to Doesn't Work Bonus: 10 Bash Scripting Golden Rules

Discussion

I spent $788 on an AI coding agent in one day. Here's the breakdown.

Key Takeaways

Related Articles

KeepAI: a local, open-source API hub that lets AI agents use your apps safely

From idea to paying customers: building an AI changelog tool with Angular 21

Iterative Security Audit: 45 Probes, 0 Critical, 6 Regression Tests Kept

Great Stack to Doesn't Work Bonus: 10 Bash Scripting Golden Rules

Discussion

Related Articles

Dev.to
KeepAI: a local, open-source API hub that lets AI agents use your apps safely
AI agents are getting good at doing things — triaging your inbox, updating a Notion doc, opening a GitHub issue, moving a Trello card. But to do any of that, an agent needs access to your accounts. And that's where most setups quietly become a problem: you hand an agent a long-lived API key or an OA

Dev.to
From idea to paying customers: building an AI changelog tool with Angular 21
How I built Releasely with Angular 21, Supabase, and Claude API I just launched Releasely — an AI changelog generator for indie SaaS founders. From first commit to deployed product while working part-time around my day job. Here's the full stack and what I learned. Every time I shipped a release,

Dev.to
Iterative Security Audit: 45 Probes, 0 Critical, 6 Regression Tests Kept
Throughout this series, I've shared patterns discovered during a security audit on a Go authentication service: PKCS#12, timing oracle, lockout, CSRF, mTLS, CRL, CQRS. Let's now talk about the methodology itself: how the audit was conducted, and how we turned results into permanent tests. An audit i

Dev.to
Great Stack to Doesn't Work Bonus: 10 Bash Scripting Golden Rules
Great Stack to Doesn't Work — Bonus 10 Bash Scripting Golden Rules Because your deployment script is production code whether you admit it or not. 1. Start every script with set -euo pipefail. #!/usr/bin/env bash set -euo pipefail -e: Exit on any command failure. Without it, a failed rm