We ran OpenClaw at its defaults for three weeks. The bill? $87 in a single month. Most of it was wasted on a frontier model doing simple file reads.

This guide breaks down every optimization we applied to get that number down to $27/month — without losing quality on the tasks that matter. You’ll get the exact configs, the real cost math, and the security tradeoffs we weighed.

After running this stack for a production SEO operation with 500+ planned posts, here’s what actually moves the needle.

> Quick Navigation: What Is Token Optimization | 5-Tier Model Routing | Heartbeat Config | Prompt Caching | QMD Local Search | Budget Controls | Common Mistakes | FAQ


What Is Token Optimization (And Why Your OpenClaw Bill Is Too High)

Token optimization means spending the least amount of money per task without degrading output quality. In OpenClaw, every message you send, every heartbeat check, every sub-agent call burns tokens. And tokens cost money.

The problem is straightforward. OpenClaw’s default config uses openrouter/auto, which auto-selects models based on availability — not cost. That means your heartbeat (a simple “are you alive?” check that runs every hour) might hit Claude Opus at $15 per million tokens instead of Gemini Flash at $0.10.

OUR MEASURED RESULT

$87 → $27/mo

70% cost reduction with zero quality loss on writing tasks

Here’s what eats your budget:

  • Heartbeats: Run every 55-60 minutes, 24/7. If routed to an expensive model, that’s $15-30/month doing nothing.
  • Context bloat: Workspace files loaded every session. A 420-line AGENTS.md wastes tokens before you even ask a question.
  • Wrong model for the job: Using Opus for CSV parsing is like hiring a lawyer to sort your mail.
  • Cache misses: Dynamic content in system prompts (timestamps, dates) destroys caching and costs 10x more.

Key takeaway: The single biggest lever is model routing. Fix that first and you’ll cut 50-60% immediately.

Related: How I Cut My AI Agent Costs by 70% with Smart Model Routing goes deep on routing alone.


The 5-Tier Model Routing System

This is the config that changed everything. Instead of one model for all tasks, we pin five tiers matched to task complexity.

TierModelAliasCost per 1M TokensUse For
BudgetGemini 2.0 Flashfast$0.10Heartbeats, classification, file ops, data extraction
WorkerKimi K2.5kimi$0.60SEO analysis, agentic browsing, multi-step reasoning
WriterClaude Sonnet 4.5sonnet$3.00All content writing — articles, blog posts, long-form
QualityClaude Sonnet / GPTsonnet$3.00Complex reasoning, code architecture, security audits
FrontierClaude Opus 4.6opus$15.00Only when explicitly requested via /model opus

The math is simple. If 75% of your tasks run on Flash ($0.10) instead of auto-routed to Sonnet ($3.00), you save 96% on those tasks.

💡 Pro Tip

When a user says “use the best model,” that means Sonnet — not Opus. Opus is only for tasks where the user explicitly types /model opus. This one rule prevents most accidental overspend.

Here’s the models section from our actual openclaw.json:

OPENCLAW.JSON — MODELS CONFIG

{
  "agents": {
    "defaults": {
      "model": "google/gemini-2.0-flash-001",
      "models": {
        "fast":   { "id": "google/gemini-2.0-flash-001" },
        "kimi":   { "id": "moonshot/kimi-k2.5" },
        "sonnet": { "id": "anthropic/claude-sonnet-4-5-20250514" },
        "opus":   { "id": "anthropic/claude-opus-4-6" },
        "sonar":  { "id": "perplexity/sonar-pro" }
      }
    }
  }
}

Notice the default is gemini-2.0-flash-001. Every task starts cheap. You escalate to a better model only when the task genuinely demands it.

The Routing Commandments

These rules are baked into our agent instructions:

  1. Heartbeats always use fast. Configured in openclaw.json. Never override.
  2. Sub-agents default to fast. Only escalate if the task needs real reasoning.
  3. Never use opus for automation, cron jobs, or batch processing.
  4. All content writing must use sonnet. Switch with /model sonnet before writing.
  5. SEO analysis uses kimi. Kimi K2.5 excels at agentic browsing tasks.
  6. Batch operations: 10 items per prompt, not 10 separate prompts. Saves 40%.

Want the Full Config File?

We’ve open-sourced our complete openclaw.json with all 5 model tiers, agent configs, and caching settings. Grab it from our GitHub repo.

Related: AI Model Showdown for SEO: Gemini Flash vs Sonnet vs Kimi K2.5 compares each model’s quality for SEO-specific tasks.


Heartbeat Configuration

The heartbeat is OpenClaw’s “are you still there?” check. It runs continuously — typically every 55-60 minutes. If you get this wrong, you’re burning money around the clock.

Our Config

OPENCLAW.JSON — HEARTBEAT

"heartbeat": {
  "model": "google/gemini-2.0-flash-001",
  "interval": 55,
  "directPolicy": "allow"
}

Two decisions matter here:

  • Model: Gemini Flash — costs about $1.50/month for 24/7 heartbeats. Opus would cost $15-30/month for the same checks.
  • Interval: 55 minutes — aligns with our 1-hour prompt cache TTL. The heartbeat keeps the cache warm so your next real conversation doesn’t pay full price.

Why Not Ollama (Free)?

Some guides recommend running heartbeats on Ollama, a free local model. We tried it. Don’t.

⚠️ Warning — The “3 AM Vulnerability”

Local models like Ollama lack the prompt injection hardening that frontier API models receive during training. The heartbeat runs 24/7 — including at 3 AM when you’re asleep. If it processes a compromised email or webpage, a local model is far more likely to follow malicious instructions. The $1.50/month for Gemini Flash buys you training-level injection resistance.

This isn’t theoretical. Security researchers at Palo Alto Networks have documented prompt injection attacks against AI agents. The heartbeat is a particularly attractive target because it runs unattended.

Related: Securing Your AI Agent: ClawHavoc, CVE-2026-25253 & How We Hardened covers the full security picture.


Prompt Caching and Context Management

Prompt caching gives you a 90% discount on tokens the model has already seen. On Anthropic models through OpenRouter, cached reads cost 10% of normal. But one wrong config destroys it.

Enable Long Cache

OPENCLAW.JSON — CACHING

"params": {
  "cacheRetention": "long",
  "contextTokens": 50000
}

Setting cacheRetention to "long" gives you a 1-hour TTL. Combined with our 55-minute heartbeat interval, the cache stays warm continuously.

What Destroys Caching

Three things will kill your cache hit rate:

⚠️ Cache Killers

1. Dynamic timestamps in system prompts. If your workspace files inject “Current Date: March 2, 2026” into the system prompt, every single call has a different prefix. Cache miss every time.

2. Changing SOUL.md or AGENTS.md mid-session. These files form the system prompt. Edit them = invalidate the cache.

3. OpenRouter provider pass-through issues. Some providers don’t forward cache_control headers. Check Issue #9600 if you suspect cache isn’t working.

Trim Your Workspace Files

Every file loaded at session startup eats tokens. We reduced our session load from 420 lines to 158 lines — a 62% reduction.

How we did it:

FileBeforeAfterWhat Changed
AGENTS.md264 lines95 linesMoved group chat rules, heartbeat guide, project context to separate on-demand files
IDENTITY.md24 lines5 linesRemoved template boilerplate, filled in actual values
USER.md18 lines6 linesSame — removed template, added real info
TOOLS.md41 lines11 linesStripped examples, kept only our actual tools
BOOTSTRAP.md56 linesDeletedFirst-run file, docs say delete after setup

💡 Pro Tip

Keep static files (SOUL.md, IDENTITY.md) separate from dynamic files (daily memory notes). Static files cache perfectly. Dynamic files should load last so they don’t invalidate the cache prefix of everything before them.

Session Hygiene Commands

These commands are your daily tools for controlling context size:

📝 OpenClaw Commands

/compact   — Compress context when it grows past 30K tokens
/new       — Start fresh session after completing a task
/status    — Check context size, model, and token usage
/model X   — Switch to a specific model tier (e.g., /model sonnet)

Rule of thumb: Run /compact after every major task. Start /new sessions rather than letting context bloat. Check /status before writing — make sure you’re on the right model.


QMD Local Search

QMD (Query Markup Documents) is a local search engine by Tobi Lutke. It uses BM25 + vector search + LLM reranking to find relevant content from your knowledge base — and only injects the relevant snippets, not entire files.

TOKEN REDUCTION

90%

Fewer memory tokens injected per session with QMD vs full file loading

Quick Setup

QMD requires WSL2 on Windows. It does not work on native Windows (missing sqlite-vec binary, tsx module errors).

  1. Install build tools: sudo apt-get install -y build-essential
  2. Install QMD: npm install -g @tobilu/qmd
  3. Verify: qmd --version (should show 1.0.7+)
  4. First run auto-downloads ~2GB of GGUF models (one-time)

Then add this to your openclaw.json:

OPENCLAW.JSON — QMD MEMORY BACKEND

"memory": {
  "backend": "qmd",
  "qmd": {
    "searchMode": "hybrid",
    "includeDefaultMemory": true,
    "paths": ["~/openclaw-workspace/memory"],
    "updateInterval": 300,
    "maxResults": 5
  }
}

💡 Pro Tip

QMD runs 100% locally. No API calls, no data leaves your machine. Search latency is about 47ms per lookup. Install it once your memory files exceed ~2,000 tokens total — before that, full file loading is fine.

Related: Setting Up QMD for Local AI Search: Installation & Real Results covers the full walkthrough including the WSL2 gotchas we hit.


Budget Controls

Even with perfect routing, mistakes happen. A runaway loop, a forgotten /model opus switch, or a sub-agent that escalates unexpectedly. Budget guardrails are your safety net.

OpenRouter Daily Limit

  1. Go to openrouter.ai/settings/limits
  2. Create a guardrail: $3/day
  3. Assign it to your OpenClaw API key

This caps your worst-case at ~$90/month. Our expected spend is $18-27/month, so the $3/day limit gives plenty of headroom for busy days without allowing runaway costs.

⚠️ Warning

Without a budget guardrail, a single misconfigured batch job could burn $50+ in one night. We’ve seen reports of users hitting $200+ bills from automation loops that escalated to Opus. Set the guardrail before going live.

Monitoring Workflow

Check these regularly:

  • Daily: Run /status to check context size and current model
  • Weekly: Review OpenRouter dashboard for per-model cost breakdown
  • Monthly: Screenshot dashboard, compare against estimates, adjust tiers if needed

Per-Agent Configuration

Instead of one model for everything, define specialized agents with pinned models and context limits:

Agent IDPinned ModelContext LimitPurpose
content-writerClaude Sonnet80K tokensArticle writing, rewrites, content creation
seo-analystKimi K2.550K tokensSEO audits, keyword research, competitor analysis
data-workerGemini Flash30K tokensCSV processing, API calls, data extraction
Default (all others)Gemini Flash50K tokensEverything else starts cheap

OPENCLAW.JSON — AGENT LIST

"list": [
  {
    "id": "content-writer",
    "model": "anthropic/claude-sonnet-4-5-20250514",
    "params": { "contextTokens": 80000 }
  },
  {
    "id": "seo-analyst",
    "model": "moonshot/kimi-k2.5",
    "params": { "contextTokens": 50000 }
  },
  {
    "id": "data-worker",
    "model": "google/gemini-2.0-flash-001",
    "params": { "contextTokens": 30000 }
  }
]

Context limits matter. A data worker processing CSVs doesn’t need 80K tokens of context. Capping it at 30K forces compaction earlier and keeps costs tight.

Compaction itself runs on Flash — don’t waste Sonnet tokens on mechanical text summarization.

OPENCLAW.JSON — COMPACTION

"compaction": {
  "model": "google/gemini-2.0-flash-001"
}

Running a Multi-Agent SEO Operation?

See how we wired OpenClaw + n8n + 10 Python scripts into a full AI SEO stack for $27/month.

Read: Why We Built a $27/mo AI SEO Operation →

Related: Building an SEO Audit Swarm with AI Agents shows how our seo-analyst and data-worker agents work together.


Common Token Optimization Mistakes

Mistake 1: Using Opus for Batch Jobs

Opus ($15/1M tokens) is a reasoning powerhouse. But if you’re processing 50 URLs, extracting titles, or running classification tasks — that’s Flash territory. We’ve seen batch jobs that should cost $0.15 cost $22 because the model wasn’t switched.

Fix: Pin batch and automation tasks to fast. Only escalate if the output quality is measurably bad.

Mistake 2: Timestamps in System Prompts

If your workspace files inject “Current Date and Time: March 2, 2026 14:30:05” into the system prompt, you’ve just invalidated your entire cache. Every call gets a unique prefix. Every call pays full price.

Fix: Keep workspace files 100% static. Let the model infer the date from conversation context, or inject it in the user message (not the system prompt).

Mistake 3: Never Compacting Sessions

OpenClaw sessions grow. A 50-turn conversation about SEO analysis can hit 100K+ tokens. Every subsequent message pays for all that context.

Fix: Run /compact after completing each task. Start /new sessions between unrelated tasks. Check /status regularly — if context exceeds 30K for a simple task, compact immediately.

Mistake 4: Loading Entire Files as Memory

Without QMD, OpenClaw dumps your entire MEMORY.md, all daily notes, and any referenced files directly into context. A 5,000-token memory file is loaded in full even when the conversation only needs one paragraph.

Fix: Install QMD. It returns only the 5 most relevant snippets instead of the entire file. 90% reduction in memory tokens.

Mistake 5: Not Setting a Budget Guardrail

“I’ll monitor it manually” works until it doesn’t. One unattended batch job at 3 AM can blow your monthly budget in a single night.

Fix: Set a $3/day guardrail on OpenRouter immediately. Takes 30 seconds. Prevents the $200 surprise bills that show up on forums regularly.


The Complete Cost Breakdown

Here’s what our operation actually costs with all optimizations applied:

Category% of TasksModelMonthly Cost
Heartbeats & idle checks~15%Gemini Flash~$1.50
Data extraction & file ops~40%Gemini Flash~$4.00
SEO analysis & browsing~15%Kimi K2.5~$5.00
Content writing~25%Claude Sonnet~$14.00
Architecture & debugging~5%Claude Opus~$3.00
Total100%Mixed~$27.50

MONTHLY COST COMPARISON

$27 vs $87

Same quality output. Same number of tasks. Just smarter routing.

The writing budget ($14/month on Sonnet) is non-negotiable. Content quality is what drives SEO rankings. You save everywhere else so you can afford to spend here.

“The teams that understand model routing will build 10x more with the same budget. It’s not about spending less — it’s about spending on the right tokens.”

— Matt Ganzak, OpenClaw Token Optimization Guide, 2026


Security Guardrails (Non-Negotiable)

Token optimization shouldn’t compromise security. These are the guardrails we never disable:

  • 🔒 Heartbeat uses API model with prompt injection resistance (not local Ollama)
  • 🔒 Gateway bound to 127.0.0.1 only — never exposed to the network
  • 🔒 Token-based gateway authentication
  • 🔒 Phone/user allowlist on messaging channel
  • 🔒 HEARTBEAT.md kept empty — minimal attack surface during heartbeat cycles
  • 🔒 Never use local models for tasks involving untrusted content (web scraping, email processing)

Related: Securing Your AI Agent in 2026: ClawHavoc & CVE-2026-25253 covers the ClawHavoc supply chain attack (1,184 malicious skills), the WebSocket RCE, and how we hardened against them.


Frequently Asked Questions

Is OpenClaw free to run?

OpenClaw itself is free and open-source. The cost comes from the AI models it calls through APIs like OpenRouter. With our optimized config, expect $18-27/month for a production SEO operation. A minimal personal assistant setup can run under $5/month on Gemini Flash alone.

Can I use Ollama to make it completely free?

Technically yes, but we don’t recommend it for production. Local models lack the prompt injection hardening of API models. For a personal hobby project with no sensitive data, Ollama is fine. For a business operation handling credentials, emails, and financial data — use API models with training-level security. Gemini Flash at $0.10/1M tokens is nearly free anyway.

How much does Claude Opus cost on OpenClaw?

About $15 per million tokens. In our setup, Opus handles roughly 5% of tasks (architecture decisions, complex debugging, security audits), costing about $3/month. The key is never letting Opus touch routine tasks. A single batch job accidentally routed to Opus can cost more than your entire month of Flash usage.

Does prompt caching work with OpenRouter?

Yes, with caveats. Set cacheRetention: "long" for a 1-hour TTL. Cached reads get a 90% discount on Anthropic models. However, some OpenRouter provider pass-throughs don’t forward cache_control headers properly (see GitHub Issue #9600). Verify by checking that cacheRead > 0 after multiple turns in the same session.

What’s the minimum setup for token optimization?

Three changes that take five minutes:

  1. Set "model": "google/gemini-2.0-flash-001" as default in openclaw.json
  2. Set "heartbeat.model": "google/gemini-2.0-flash-001"
  3. Set a $3/day budget guardrail on OpenRouter

That alone cuts 50-60% off most users’ bills. Add caching, QMD, and per-agent configs later for the remaining savings.

How do I check if my optimizations are working?

Run /status in any OpenClaw conversation. It shows your current model, context size, and token usage. Then check the OpenRouter dashboard for per-model spending breakdown. After 24 hours, verify: heartbeats hit Flash (not Sonnet/Opus), writing tasks hit Sonnet, and daily spend stays under $3.


Getting Started: Your Next Steps

☑ Quick-Start Checklist

  • ☐ Set default model to Gemini Flash in openclaw.json
  • ☐ Pin heartbeat to Gemini Flash at 55-minute interval
  • ☐ Set cacheRetention: "long"
  • ☐ Set $3/day budget guardrail on OpenRouter
  • ☐ Trim workspace files (remove template boilerplate)
  • ☐ Define per-agent models and context limits
  • ☐ Install QMD in WSL2 for local memory search
  • ☐ Run /status and verify after 24 hours

Here’s where to go based on your situation:

🔎 Key Takeaways

  • Model routing is the biggest lever — switching defaults from auto to Gemini Flash cuts 50-60% immediately
  • Heartbeats should use the cheapest API model — not Ollama (security risk) and not your default writer model
  • Prompt caching gives 90% discounts — but only if your system prompts are 100% static
  • QMD reduces memory tokens by 90% — install it once your memory files grow past 2,000 tokens
  • Budget guardrails are non-negotiable — $3/day on OpenRouter prevents surprise bills
  • Our real result: $87/month down to $27/month — same output quality, smarter routing

Explore our complete AI Automation & Workflows hub for more guides on building production AI agent systems.