We ran OpenClaw at its defaults for three weeks. The bill? $87 in a single month. Most of it was wasted on a frontier model doing simple file reads.
This guide breaks down every optimization we applied to get that number down to $27/month — without losing quality on the tasks that matter. You’ll get the exact configs, the real cost math, and the security tradeoffs we weighed.
After running this stack for a production SEO operation with 500+ planned posts, here’s what actually moves the needle.
> Quick Navigation: What Is Token Optimization | 5-Tier Model Routing | Heartbeat Config | Prompt Caching | QMD Local Search | Budget Controls | Common Mistakes | FAQ
What Is Token Optimization (And Why Your OpenClaw Bill Is Too High)
Token optimization means spending the least amount of money per task without degrading output quality. In OpenClaw, every message you send, every heartbeat check, every sub-agent call burns tokens. And tokens cost money.
The problem is straightforward. OpenClaw’s default config uses openrouter/auto, which auto-selects models based on availability — not cost. That means your heartbeat (a simple “are you alive?” check that runs every hour) might hit Claude Opus at $15 per million tokens instead of Gemini Flash at $0.10.
OUR MEASURED RESULT
$87 → $27/mo
70% cost reduction with zero quality loss on writing tasks
Here’s what eats your budget:
- Heartbeats: Run every 55-60 minutes, 24/7. If routed to an expensive model, that’s $15-30/month doing nothing.
- Context bloat: Workspace files loaded every session. A 420-line AGENTS.md wastes tokens before you even ask a question.
- Wrong model for the job: Using Opus for CSV parsing is like hiring a lawyer to sort your mail.
- Cache misses: Dynamic content in system prompts (timestamps, dates) destroys caching and costs 10x more.
Key takeaway: The single biggest lever is model routing. Fix that first and you’ll cut 50-60% immediately.
Related: How I Cut My AI Agent Costs by 70% with Smart Model Routing goes deep on routing alone.
The 5-Tier Model Routing System
This is the config that changed everything. Instead of one model for all tasks, we pin five tiers matched to task complexity.
| Tier | Model | Alias | Cost per 1M Tokens | Use For |
|---|---|---|---|---|
| Budget | Gemini 2.0 Flash | fast | $0.10 | Heartbeats, classification, file ops, data extraction |
| Worker | Kimi K2.5 | kimi | $0.60 | SEO analysis, agentic browsing, multi-step reasoning |
| Writer | Claude Sonnet 4.5 | sonnet | $3.00 | All content writing — articles, blog posts, long-form |
| Quality | Claude Sonnet / GPT | sonnet | $3.00 | Complex reasoning, code architecture, security audits |
| Frontier | Claude Opus 4.6 | opus | $15.00 | Only when explicitly requested via /model opus |
The math is simple. If 75% of your tasks run on Flash ($0.10) instead of auto-routed to Sonnet ($3.00), you save 96% on those tasks.
💡 Pro Tip
When a user says “use the best model,” that means Sonnet — not Opus. Opus is only for tasks where the user explicitly types /model opus. This one rule prevents most accidental overspend.
Here’s the models section from our actual openclaw.json:
OPENCLAW.JSON — MODELS CONFIG
{
"agents": {
"defaults": {
"model": "google/gemini-2.0-flash-001",
"models": {
"fast": { "id": "google/gemini-2.0-flash-001" },
"kimi": { "id": "moonshot/kimi-k2.5" },
"sonnet": { "id": "anthropic/claude-sonnet-4-5-20250514" },
"opus": { "id": "anthropic/claude-opus-4-6" },
"sonar": { "id": "perplexity/sonar-pro" }
}
}
}
}Notice the default is gemini-2.0-flash-001. Every task starts cheap. You escalate to a better model only when the task genuinely demands it.
The Routing Commandments
These rules are baked into our agent instructions:
- Heartbeats always use
fast. Configured in openclaw.json. Never override. - Sub-agents default to
fast. Only escalate if the task needs real reasoning. - Never use
opusfor automation, cron jobs, or batch processing. - All content writing must use
sonnet. Switch with/model sonnetbefore writing. - SEO analysis uses
kimi. Kimi K2.5 excels at agentic browsing tasks. - Batch operations: 10 items per prompt, not 10 separate prompts. Saves 40%.
Want the Full Config File?
We’ve open-sourced our complete openclaw.json with all 5 model tiers, agent configs, and caching settings. Grab it from our GitHub repo.
Related: AI Model Showdown for SEO: Gemini Flash vs Sonnet vs Kimi K2.5 compares each model’s quality for SEO-specific tasks.
Heartbeat Configuration
The heartbeat is OpenClaw’s “are you still there?” check. It runs continuously — typically every 55-60 minutes. If you get this wrong, you’re burning money around the clock.
Our Config
OPENCLAW.JSON — HEARTBEAT
"heartbeat": {
"model": "google/gemini-2.0-flash-001",
"interval": 55,
"directPolicy": "allow"
}Two decisions matter here:
- ➤ Model: Gemini Flash — costs about $1.50/month for 24/7 heartbeats. Opus would cost $15-30/month for the same checks.
- ➤ Interval: 55 minutes — aligns with our 1-hour prompt cache TTL. The heartbeat keeps the cache warm so your next real conversation doesn’t pay full price.
Why Not Ollama (Free)?
Some guides recommend running heartbeats on Ollama, a free local model. We tried it. Don’t.
⚠️ Warning — The “3 AM Vulnerability”
Local models like Ollama lack the prompt injection hardening that frontier API models receive during training. The heartbeat runs 24/7 — including at 3 AM when you’re asleep. If it processes a compromised email or webpage, a local model is far more likely to follow malicious instructions. The $1.50/month for Gemini Flash buys you training-level injection resistance.
This isn’t theoretical. Security researchers at Palo Alto Networks have documented prompt injection attacks against AI agents. The heartbeat is a particularly attractive target because it runs unattended.
Related: Securing Your AI Agent: ClawHavoc, CVE-2026-25253 & How We Hardened covers the full security picture.
Prompt Caching and Context Management
Prompt caching gives you a 90% discount on tokens the model has already seen. On Anthropic models through OpenRouter, cached reads cost 10% of normal. But one wrong config destroys it.
Enable Long Cache
OPENCLAW.JSON — CACHING
"params": {
"cacheRetention": "long",
"contextTokens": 50000
}Setting cacheRetention to "long" gives you a 1-hour TTL. Combined with our 55-minute heartbeat interval, the cache stays warm continuously.
What Destroys Caching
Three things will kill your cache hit rate:
⚠️ Cache Killers
1. Dynamic timestamps in system prompts. If your workspace files inject “Current Date: March 2, 2026” into the system prompt, every single call has a different prefix. Cache miss every time.
2. Changing SOUL.md or AGENTS.md mid-session. These files form the system prompt. Edit them = invalidate the cache.
3. OpenRouter provider pass-through issues. Some providers don’t forward cache_control headers. Check Issue #9600 if you suspect cache isn’t working.
Trim Your Workspace Files
Every file loaded at session startup eats tokens. We reduced our session load from 420 lines to 158 lines — a 62% reduction.
How we did it:
| File | Before | After | What Changed |
|---|---|---|---|
| AGENTS.md | 264 lines | 95 lines | Moved group chat rules, heartbeat guide, project context to separate on-demand files |
| IDENTITY.md | 24 lines | 5 lines | Removed template boilerplate, filled in actual values |
| USER.md | 18 lines | 6 lines | Same — removed template, added real info |
| TOOLS.md | 41 lines | 11 lines | Stripped examples, kept only our actual tools |
| BOOTSTRAP.md | 56 lines | Deleted | First-run file, docs say delete after setup |
💡 Pro Tip
Keep static files (SOUL.md, IDENTITY.md) separate from dynamic files (daily memory notes). Static files cache perfectly. Dynamic files should load last so they don’t invalidate the cache prefix of everything before them.
Session Hygiene Commands
These commands are your daily tools for controlling context size:
📝 OpenClaw Commands
/compact — Compress context when it grows past 30K tokens /new — Start fresh session after completing a task /status — Check context size, model, and token usage /model X — Switch to a specific model tier (e.g., /model sonnet)
Rule of thumb: Run /compact after every major task. Start /new sessions rather than letting context bloat. Check /status before writing — make sure you’re on the right model.
QMD Local Search
QMD (Query Markup Documents) is a local search engine by Tobi Lutke. It uses BM25 + vector search + LLM reranking to find relevant content from your knowledge base — and only injects the relevant snippets, not entire files.
TOKEN REDUCTION
90%
Fewer memory tokens injected per session with QMD vs full file loading
Quick Setup
QMD requires WSL2 on Windows. It does not work on native Windows (missing sqlite-vec binary, tsx module errors).
- Install build tools:
sudo apt-get install -y build-essential - Install QMD:
npm install -g @tobilu/qmd - Verify:
qmd --version(should show 1.0.7+) - First run auto-downloads ~2GB of GGUF models (one-time)
Then add this to your openclaw.json:
OPENCLAW.JSON — QMD MEMORY BACKEND
"memory": {
"backend": "qmd",
"qmd": {
"searchMode": "hybrid",
"includeDefaultMemory": true,
"paths": ["~/openclaw-workspace/memory"],
"updateInterval": 300,
"maxResults": 5
}
}💡 Pro Tip
QMD runs 100% locally. No API calls, no data leaves your machine. Search latency is about 47ms per lookup. Install it once your memory files exceed ~2,000 tokens total — before that, full file loading is fine.
Related: Setting Up QMD for Local AI Search: Installation & Real Results covers the full walkthrough including the WSL2 gotchas we hit.
Budget Controls
Even with perfect routing, mistakes happen. A runaway loop, a forgotten /model opus switch, or a sub-agent that escalates unexpectedly. Budget guardrails are your safety net.
OpenRouter Daily Limit
- Go to openrouter.ai/settings/limits
- Create a guardrail: $3/day
- Assign it to your OpenClaw API key
This caps your worst-case at ~$90/month. Our expected spend is $18-27/month, so the $3/day limit gives plenty of headroom for busy days without allowing runaway costs.
⚠️ Warning
Without a budget guardrail, a single misconfigured batch job could burn $50+ in one night. We’ve seen reports of users hitting $200+ bills from automation loops that escalated to Opus. Set the guardrail before going live.
Monitoring Workflow
Check these regularly:
- ✔ Daily: Run
/statusto check context size and current model - ✔ Weekly: Review OpenRouter dashboard for per-model cost breakdown
- ✔ Monthly: Screenshot dashboard, compare against estimates, adjust tiers if needed
Per-Agent Configuration
Instead of one model for everything, define specialized agents with pinned models and context limits:
| Agent ID | Pinned Model | Context Limit | Purpose |
|---|---|---|---|
content-writer | Claude Sonnet | 80K tokens | Article writing, rewrites, content creation |
seo-analyst | Kimi K2.5 | 50K tokens | SEO audits, keyword research, competitor analysis |
data-worker | Gemini Flash | 30K tokens | CSV processing, API calls, data extraction |
| Default (all others) | Gemini Flash | 50K tokens | Everything else starts cheap |
OPENCLAW.JSON — AGENT LIST
"list": [
{
"id": "content-writer",
"model": "anthropic/claude-sonnet-4-5-20250514",
"params": { "contextTokens": 80000 }
},
{
"id": "seo-analyst",
"model": "moonshot/kimi-k2.5",
"params": { "contextTokens": 50000 }
},
{
"id": "data-worker",
"model": "google/gemini-2.0-flash-001",
"params": { "contextTokens": 30000 }
}
]Context limits matter. A data worker processing CSVs doesn’t need 80K tokens of context. Capping it at 30K forces compaction earlier and keeps costs tight.
Compaction itself runs on Flash — don’t waste Sonnet tokens on mechanical text summarization.
OPENCLAW.JSON — COMPACTION
"compaction": {
"model": "google/gemini-2.0-flash-001"
}Running a Multi-Agent SEO Operation?
See how we wired OpenClaw + n8n + 10 Python scripts into a full AI SEO stack for $27/month.
Related: Building an SEO Audit Swarm with AI Agents shows how our seo-analyst and data-worker agents work together.
Common Token Optimization Mistakes
Mistake 1: Using Opus for Batch Jobs
Opus ($15/1M tokens) is a reasoning powerhouse. But if you’re processing 50 URLs, extracting titles, or running classification tasks — that’s Flash territory. We’ve seen batch jobs that should cost $0.15 cost $22 because the model wasn’t switched.
Fix: Pin batch and automation tasks to fast. Only escalate if the output quality is measurably bad.
Mistake 2: Timestamps in System Prompts
If your workspace files inject “Current Date and Time: March 2, 2026 14:30:05” into the system prompt, you’ve just invalidated your entire cache. Every call gets a unique prefix. Every call pays full price.
Fix: Keep workspace files 100% static. Let the model infer the date from conversation context, or inject it in the user message (not the system prompt).
Mistake 3: Never Compacting Sessions
OpenClaw sessions grow. A 50-turn conversation about SEO analysis can hit 100K+ tokens. Every subsequent message pays for all that context.
Fix: Run /compact after completing each task. Start /new sessions between unrelated tasks. Check /status regularly — if context exceeds 30K for a simple task, compact immediately.
Mistake 4: Loading Entire Files as Memory
Without QMD, OpenClaw dumps your entire MEMORY.md, all daily notes, and any referenced files directly into context. A 5,000-token memory file is loaded in full even when the conversation only needs one paragraph.
Fix: Install QMD. It returns only the 5 most relevant snippets instead of the entire file. 90% reduction in memory tokens.
Mistake 5: Not Setting a Budget Guardrail
“I’ll monitor it manually” works until it doesn’t. One unattended batch job at 3 AM can blow your monthly budget in a single night.
Fix: Set a $3/day guardrail on OpenRouter immediately. Takes 30 seconds. Prevents the $200 surprise bills that show up on forums regularly.
The Complete Cost Breakdown
Here’s what our operation actually costs with all optimizations applied:
| Category | % of Tasks | Model | Monthly Cost |
|---|---|---|---|
| Heartbeats & idle checks | ~15% | Gemini Flash | ~$1.50 |
| Data extraction & file ops | ~40% | Gemini Flash | ~$4.00 |
| SEO analysis & browsing | ~15% | Kimi K2.5 | ~$5.00 |
| Content writing | ~25% | Claude Sonnet | ~$14.00 |
| Architecture & debugging | ~5% | Claude Opus | ~$3.00 |
| Total | 100% | Mixed | ~$27.50 |
MONTHLY COST COMPARISON
$27 vs $87
Same quality output. Same number of tasks. Just smarter routing.
The writing budget ($14/month on Sonnet) is non-negotiable. Content quality is what drives SEO rankings. You save everywhere else so you can afford to spend here.
“The teams that understand model routing will build 10x more with the same budget. It’s not about spending less — it’s about spending on the right tokens.”
— Matt Ganzak, OpenClaw Token Optimization Guide, 2026
Security Guardrails (Non-Negotiable)
Token optimization shouldn’t compromise security. These are the guardrails we never disable:
- 🔒 Heartbeat uses API model with prompt injection resistance (not local Ollama)
- 🔒 Gateway bound to
127.0.0.1only — never exposed to the network - 🔒 Token-based gateway authentication
- 🔒 Phone/user allowlist on messaging channel
- 🔒 HEARTBEAT.md kept empty — minimal attack surface during heartbeat cycles
- 🔒 Never use local models for tasks involving untrusted content (web scraping, email processing)
Related: Securing Your AI Agent in 2026: ClawHavoc & CVE-2026-25253 covers the ClawHavoc supply chain attack (1,184 malicious skills), the WebSocket RCE, and how we hardened against them.
Frequently Asked Questions
Is OpenClaw free to run?
OpenClaw itself is free and open-source. The cost comes from the AI models it calls through APIs like OpenRouter. With our optimized config, expect $18-27/month for a production SEO operation. A minimal personal assistant setup can run under $5/month on Gemini Flash alone.
Can I use Ollama to make it completely free?
Technically yes, but we don’t recommend it for production. Local models lack the prompt injection hardening of API models. For a personal hobby project with no sensitive data, Ollama is fine. For a business operation handling credentials, emails, and financial data — use API models with training-level security. Gemini Flash at $0.10/1M tokens is nearly free anyway.
How much does Claude Opus cost on OpenClaw?
About $15 per million tokens. In our setup, Opus handles roughly 5% of tasks (architecture decisions, complex debugging, security audits), costing about $3/month. The key is never letting Opus touch routine tasks. A single batch job accidentally routed to Opus can cost more than your entire month of Flash usage.
Does prompt caching work with OpenRouter?
Yes, with caveats. Set cacheRetention: "long" for a 1-hour TTL. Cached reads get a 90% discount on Anthropic models. However, some OpenRouter provider pass-throughs don’t forward cache_control headers properly (see GitHub Issue #9600). Verify by checking that cacheRead > 0 after multiple turns in the same session.
What’s the minimum setup for token optimization?
Three changes that take five minutes:
- Set
"model": "google/gemini-2.0-flash-001"as default in openclaw.json - Set
"heartbeat.model": "google/gemini-2.0-flash-001" - Set a $3/day budget guardrail on OpenRouter
That alone cuts 50-60% off most users’ bills. Add caching, QMD, and per-agent configs later for the remaining savings.
How do I check if my optimizations are working?
Run /status in any OpenClaw conversation. It shows your current model, context size, and token usage. Then check the OpenRouter dashboard for per-model spending breakdown. After 24 hours, verify: heartbeats hit Flash (not Sonnet/Opus), writing tasks hit Sonnet, and daily spend stays under $3.
Getting Started: Your Next Steps
☑ Quick-Start Checklist
- ☐ Set default model to Gemini Flash in openclaw.json
- ☐ Pin heartbeat to Gemini Flash at 55-minute interval
- ☐ Set
cacheRetention: "long" - ☐ Set $3/day budget guardrail on OpenRouter
- ☐ Trim workspace files (remove template boilerplate)
- ☐ Define per-agent models and context limits
- ☐ Install QMD in WSL2 for local memory search
- ☐ Run
/statusand verify after 24 hours
Here’s where to go based on your situation:
- 🚀 Just getting started? Apply the 3 quick fixes from the FAQ above. Takes 5 minutes, saves 50%.
- 🚀 Want to go deeper? Read How I Cut Costs by 70% with Model Routing for the full routing breakdown.
- 🚀 Building an SEO operation? See Why We Built a $27/mo AI SEO Operation for the complete stack.
- 🚀 Concerned about security? Start with our security hardening guide before optimizing for cost.
🔎 Key Takeaways
- Model routing is the biggest lever — switching defaults from auto to Gemini Flash cuts 50-60% immediately
- Heartbeats should use the cheapest API model — not Ollama (security risk) and not your default writer model
- Prompt caching gives 90% discounts — but only if your system prompts are 100% static
- QMD reduces memory tokens by 90% — install it once your memory files grow past 2,000 tokens
- Budget guardrails are non-negotiable — $3/day on OpenRouter prevents surprise bills
- Our real result: $87/month down to $27/month — same output quality, smarter routing
Explore our complete AI Automation & Workflows hub for more guides on building production AI agent systems.
