# Zero-Cost AI Agent Stack: Cloudflare Workers + Gemini Web = 24/7 Free AI

I built a fully autonomous AI agent that costs $0/month to run. Here is exactly how. When you build an AI agent, the first thing you reach for is an API: OpenAI, DeepSeek, Groq. They work great — until you check the bill. Even at $0.14/million tokens, a moderately active agent burns through $30-50/month. That is a GPU you will never save up for. The second problem: most conversations do not need a frontier model. When a customer asks "How much is wedding photography?", you do not need GPT-5. You need a FAQ lookup + a friendly reply. Yet API pricing charges the same rate whether you are writing a novel or answering "What are your hours?" I wanted something better. So I hacked together a stack that runs completely free: Cloudflare Workers (free tier) → Gemini Web (free) → 24/7 AI Agent Here is the architecture and how you can build your own. Google Gemini's web interface at gemini.google.com is free. No API key, no rate limits, no token counting. The model is powerful enough for 90% of customer service tasks. The catch: it is a web page, not an API. But we can fix that. Workers free tier gives you 100,000 requests/day. That is more than enough for a small business AI agent handling ~500 customer chats daily. We use Chrome DevTools Protocol (CDP) to control a browser that talks to Gemini Web. A small Node.js proxy translates standard OpenAI-compatible API requests into browser actions. Customer Message ↓ Cloudflare Worker (/api/chat) ↓ Node.js Proxy (:57322) ↓ Chrome CDP → types into gemini.google.com ↓ Extracts reply → returns as API response This is the core — a server that accepts OpenAI-format requests and forwards them to Gemini Web via Playwright: const { chromium } = require("playwright"); async function getGeminiReply(prompt) { const browser = await chromium.connectOverCDP("http://127.0.0.1:9222"); const page = browser.contexts()[0].pages() .find(p => p.url().includes("gemini")); // Type and send const tb = page.getByRole("textbox", { name: /enter a prompt/i }).first(); await tb.fill(prompt); await page.getByRole("button", { name: /send message/i }).click(); // Wait for completion for (let i = 0; i < 80; i++) { await page.waitForTimeout(1000); const hasStop = await page.evaluate(() => !!document.querySelector('[aria-label="Stop generating"]') ); if (!hasStop) break; } // Extract and return return await page.evaluate(() => { const msgs = document.querySelectorAll("message-content"); return msgs[msgs.length - 1]?.innerText || ""; }); } Not every question needs Gemini. FAQ matching handles 70% of queries instantly and for free: const FAQ = [ { keywords: ["价格", "多少钱"], a: "基础套餐 ¥1999 起..." }, { keywords: ["预约", "档期"], a: "提前3-7天预约即可..." }, { keywords: ["地址", "哪里"], a: "我们在成都锦江区..." } ]; function matchFAQ(message) { for (const item of FAQ) { if (item.keywords.some(k => message.includes(k))) { return item.a; // Instant, no API call } } return null; // Fall through to Gemini } The Worker also stores customer contacts in Cloudflare KV: async function handleLead(request, env) { const { name, contact, need } = await request.json(); await env.LEADS.put(crypto.randomUUID(), JSON.stringify({ name, contact, need, created_at: new Date().toISOString() })); return Response.json({ ok: true }); } I deployed this for a photography studio in 2 hours. Here is what happened: Metric Before After Customer response time 2-6 hours Instant Missed inquiries (overnight) ~40% 0% Owner time spent on FAQs 3h/day 30min/day Monthly cost ¥0 (owner's labor) ¥0 (fully free) The live demo is at: https://ihug-demo.wigginsbuck7.workers.dev/ FAQ first: 70% of small business inquiries are repeat questions. Cache those locally. Gemini is good enough: For the 30% that need AI, Gemini Web handles it. No frontier model required for "When are you open?" Workers are essentially free: At 100k requests/day, you will not hit the limit serving a small business. KV for persistence: Cloudflare KV stores leads, so you get a mini-CRM for free. Single concurrent request: The CDP proxy handles one query at a time. For a photography studio, this is fine — you do not get 20 simultaneous chats. For high-traffic, use a queue or parallel browsers. Google login required: Someone must log into Gemini once. After that, cookies persist. Google could change the DOM: If Gemini updates its UI, the proxy needs updating. But this is a weekend project, not a startup dependency. Rate unclear: Google does not publish rate limits for the web UI. Be reasonable — do not pump 10,000 queries/day through it. The agent stack is now free: Codex Desktop → Router (:57323) → < 4000 tokens → Gemini Web (FREE) → ≥ 4000 tokens → DeepSeek API (paid, but rare) Info Pipeline → GitHub API + Hacker News + Dev.to (all free) Cloudflare KV → Lead storage (free tier) Chrome CDP → Browser automation (free) Total monthly spend: ¥0 (plus ~¥15 for DeepSeek on long replies, maybe ¥30/month). Clone the proxy: it is ~200 lines of JavaScript Create a Cloudflare Worker: copy the template above Start Chrome with --remote-debugging-port=9222 Deploy and share the URL The entire setup takes one afternoon. After that, it runs indefinitely at zero cost. If you found this useful, the demo is live at ihug-demo.wigginsbuck7.workers.dev. Drop a test inquiry — an AI will answer you, and it costs me nothing.

# Zero-Cost AI Agent Stack: Cloudflare Workers + Gemini Web = 24/7 Free AI

Key Takeaways

Related Articles

What Anthropic Actually Said About AI Building Itself

When Regex Fails: Using LLMs to Extract Structured Data from Messy Pages

Gemini Model Management: Ending Inefficiency! The Secret to 3x Faster Cost Tracking with Model Registry

I tested whether a code health score actually predicts bugs. Here's the benchmark

Discussion

# Zero-Cost AI Agent Stack: Cloudflare Workers + Gemini Web = 24/7 Free AI

Key Takeaways

Related Articles

What Anthropic Actually Said About AI Building Itself

When Regex Fails: Using LLMs to Extract Structured Data from Messy Pages

Gemini Model Management: Ending Inefficiency! The Secret to 3x Faster Cost Tracking with Model Registry

I tested whether a code health score actually predicts bugs. Here's the benchmark

Discussion

Related Articles

Dev.to
What Anthropic Actually Said About AI Building Itself
In June 2026, Anthropic released a report called "When AI builds itself." The headlines made it sound like AI was on the verge of superintelligence in which machines were building better versions of themselves in a feedback loop. The actual report asks something more specific. Can AI agents pick the

Dev.to
When Regex Fails: Using LLMs to Extract Structured Data from Messy Pages
I’ve been doing web scraping for years. For most projects, I lean on BeautifulSoup, cssselect, and a handful of regex patterns. You know the drill: inspect the page, find the selector, extract the text, clean it up. It works great when every page follows the same template. Then I hit a project that

Dev.to
Gemini Model Management: Ending Inefficiency! The Secret to 3x Faster Cost Tracking with Model Registry
Gemini Model Management: Ending Inefficiency – How Model Registry Tripled Our Cost Tracking Speed Managing our Gemini model system had become a real headache. Model versioning was a mess, and tracking costs for each AI task was incredibly inefficient. I knew something had to change, so I started loo

Dev.to
I tested whether a code health score actually predicts bugs. Here's the benchmark
Most code health scores are vibes. A number goes up, a number goes down, and nobody checks whether the files it flags are the files that actually break later. I wanted to know if the score I built does better than that, so I ran it against a defect corpus and put it head to head with the leading com