OpenClaw Token Optimization: The Complete 2026 Guide

We ran OpenClaw at its defaults for three weeks. The bill? $87 in a single month. Most of it was wasted on a frontier model doing simple file reads.

This guide breaks down every optimization we applied to get that number down to $27/month — without losing quality on the tasks that matter. You’ll get the exact configs, the real cost math, and the security tradeoffs we weighed.

After running this stack for a production SEO operation with 500+ planned posts, here’s what actually moves the needle.

What Is Token Optimization (And Why Your OpenClaw Bill Is Too High)

Token optimization means spending the least amount of money per task without degrading output quality. In OpenClaw, every message you send, every heartbeat check, every sub-agent call burns tokens. And tokens cost money.

The problem is straightforward. OpenClaw’s default config uses openrouter/auto, which auto-selects models based on availability — not cost. That means your heartbeat (a simple “are you alive?” check that runs every hour) might hit Claude Opus at $15 per million tokens instead of Gemini Flash at $0.10.

OUR MEASURED RESULT

$87 → $27/mo

70% cost reduction with zero quality loss on writing tasks

Here’s what eats your budget:

Heartbeats: Run every 55-60 minutes, 24/7. If routed to an expensive model, that’s $15-30/month doing nothing.
Context bloat: Workspace files loaded every session. A 420-line AGENTS.md wastes tokens before you even ask a question.
Wrong model for the job: Using Opus for CSV parsing is like hiring a lawyer to sort your mail.
Cache misses: Dynamic content in system prompts (timestamps, dates) destroys caching and costs 10x more.

Key takeaway: The single biggest lever is model routing. Fix that first and you’ll cut 50-60% immediately.

Related: How I Cut My AI Agent Costs by 70% with Smart Model Routing goes deep on routing alone.

The 5-Tier Model Routing System

This is the config that changed everything. Instead of one model for all tasks, we pin five tiers matched to task complexity.

Tier	Model	Alias	Cost per 1M Tokens	Use For
Budget	Gemini 2.0 Flash	`fast`	$0.10	Heartbeats, classification, file ops, data extraction
Worker	Kimi K2.5	`kimi`	$0.60	SEO analysis, agentic browsing, multi-step reasoning
Writer	Claude Sonnet 4.5	`sonnet`	$3.00	All content writing — articles, blog posts, long-form
Quality	Claude Sonnet / GPT	`sonnet`	$3.00	Complex reasoning, code architecture, security audits
Frontier	Claude Opus 4.6	`opus`	$15.00	Only when explicitly requested via `/model opus`

The math is simple. If 75% of your tasks run on Flash ($0.10) instead of auto-routed to Sonnet ($3.00), you save 96% on those tasks.

💡 Pro Tip

When a user says “use the best model,” that means Sonnet — not Opus. Opus is only for tasks where the user explicitly types /model opus. This one rule prevents most accidental overspend.

Here’s the models section from our actual openclaw.json:

OPENCLAW.JSON — MODELS CONFIG

{
  "agents": {
    "defaults": {
      "model": "google/gemini-2.0-flash-001",
      "models": {
        "fast":   { "id": "google/gemini-2.0-flash-001" },
        "kimi":   { "id": "moonshot/kimi-k2.5" },
        "sonnet": { "id": "anthropic/claude-sonnet-4-5-20250514" },
        "opus":   { "id": "anthropic/claude-opus-4-6" },
        "sonar":  { "id": "perplexity/sonar-pro" }
      }
    }
  }
}

Notice the default is gemini-2.0-flash-001. Every task starts cheap. You escalate to a better model only when the task genuinely demands it.

The Routing Commandments

These rules are baked into our agent instructions:

Heartbeats always use fast. Configured in openclaw.json. Never override.
Sub-agents default to fast. Only escalate if the task needs real reasoning.
Never use opus for automation, cron jobs, or batch processing.
All content writing must use sonnet. Switch with /model sonnet before writing.
SEO analysis uses kimi. Kimi K2.5 excels at agentic browsing tasks.
Batch operations: 10 items per prompt, not 10 separate prompts. Saves 40%.

Want the Full Config File?

We’ve open-sourced our complete openclaw.json with all 5 model tiers, agent configs, and caching settings. Grab it from our GitHub repo.

Related: AI Model Showdown for SEO: Gemini Flash vs Sonnet vs Kimi K2.5 compares each model’s quality for SEO-specific tasks.

Heartbeat Configuration

The heartbeat is OpenClaw’s “are you still there?” check. It runs continuously — typically every 55-60 minutes. If you get this wrong, you’re burning money around the clock.

Our Config

OPENCLAW.JSON — HEARTBEAT

"heartbeat": {
  "model": "google/gemini-2.0-flash-001",
  "interval": 55,
  "directPolicy": "allow"
}

Two decisions matter here:

➤ Model: Gemini Flash — costs about $1.50/month for 24/7 heartbeats. Opus would cost $15-30/month for the same checks.
➤ Interval: 55 minutes — aligns with our 1-hour prompt cache TTL. The heartbeat keeps the cache warm so your next real conversation doesn’t pay full price.

Why Not Ollama (Free)?

Some guides recommend running heartbeats on Ollama, a free local model. We tried it. Don’t.

⚠️ Warning — The “3 AM Vulnerability”

Local models like Ollama lack the prompt injection hardening that frontier API models receive during training. The heartbeat runs 24/7 — including at 3 AM when you’re asleep. If it processes a compromised email or webpage, a local model is far more likely to follow malicious instructions. The $1.50/month for Gemini Flash buys you training-level injection resistance.

This isn’t theoretical. Security researchers at Palo Alto Networks have documented prompt injection attacks against AI agents. The heartbeat is a particularly attractive target because it runs unattended.

Related: Securing Your AI Agent: ClawHavoc, CVE-2026-25253 & How We Hardened covers the full security picture.

Prompt Caching and Context Management

Prompt caching gives you a 90% discount on tokens the model has already seen. On Anthropic models through OpenRouter, cached reads cost 10% of normal. But one wrong config destroys it.

Enable Long Cache

OPENCLAW.JSON — CACHING

"params": {
  "cacheRetention": "long",
  "contextTokens": 50000
}

Setting cacheRetention to "long" gives you a 1-hour TTL. Combined with our 55-minute heartbeat interval, the cache stays warm continuously.

What Destroys Caching

Three things will kill your cache hit rate:

⚠️ Cache Killers

1. Dynamic timestamps in system prompts. If your workspace files inject “Current Date: March 2, 2026” into the system prompt, every single call has a different prefix. Cache miss every time.

2. Changing SOUL.md or AGENTS.md mid-session. These files form the system prompt. Edit them = invalidate the cache.

3. OpenRouter provider pass-through issues. Some providers don’t forward cache_control headers. Check Issue #9600 if you suspect cache isn’t working.

Trim Your Workspace Files

Every file loaded at session startup eats tokens. We reduced our session load from 420 lines to 158 lines — a 62% reduction.

How we did it:

File	Before	After	What Changed
AGENTS.md	264 lines	95 lines	Moved group chat rules, heartbeat guide, project context to separate on-demand files
IDENTITY.md	24 lines	5 lines	Removed template boilerplate, filled in actual values
USER.md	18 lines	6 lines	Same — removed template, added real info
TOOLS.md	41 lines	11 lines	Stripped examples, kept only our actual tools
BOOTSTRAP.md	56 lines	Deleted	First-run file, docs say delete after setup

💡 Pro Tip

Keep static files (SOUL.md, IDENTITY.md) separate from dynamic files (daily memory notes). Static files cache perfectly. Dynamic files should load last so they don’t invalidate the cache prefix of everything before them.

Session Hygiene Commands

These commands are your daily tools for controlling context size:

📝 OpenClaw Commands

/compact   — Compress context when it grows past 30K tokens
/new       — Start fresh session after completing a task
/status    — Check context size, model, and token usage
/model X   — Switch to a specific model tier (e.g., /model sonnet)

Rule of thumb: Run /compact after every major task. Start /new sessions rather than letting context bloat. Check /status before writing — make sure you’re on the right model.

QMD Local Search

QMD (Query Markup Documents) is a local search engine by Tobi Lutke. It uses BM25 + vector search + LLM reranking to find relevant content from your knowledge base — and only injects the relevant snippets, not entire files.

TOKEN REDUCTION

90%

Fewer memory tokens injected per session with QMD vs full file loading

Quick Setup

QMD requires WSL2 on Windows. It does not work on native Windows (missing sqlite-vec binary, tsx module errors).

Install build tools: sudo apt-get install -y build-essential
Install QMD: npm install -g @tobilu/qmd
Verify: qmd --version (should show 1.0.7+)
First run auto-downloads ~2GB of GGUF models (one-time)

Then add this to your openclaw.json:

OPENCLAW.JSON — QMD MEMORY BACKEND

"memory": {
  "backend": "qmd",
  "qmd": {
    "searchMode": "hybrid",
    "includeDefaultMemory": true,
    "paths": ["~/openclaw-workspace/memory"],
    "updateInterval": 300,
    "maxResults": 5
  }
}

💡 Pro Tip

QMD runs 100% locally. No API calls, no data leaves your machine. Search latency is about 47ms per lookup. Install it once your memory files exceed ~2,000 tokens total — before that, full file loading is fine.

Related: Setting Up QMD for Local AI Search: Installation & Real Results covers the full walkthrough including the WSL2 gotchas we hit.

Budget Controls

Even with perfect routing, mistakes happen. A runaway loop, a forgotten /model opus switch, or a sub-agent that escalates unexpectedly. Budget guardrails are your safety net.

OpenRouter Daily Limit

Go to openrouter.ai/settings/limits
Create a guardrail: $3/day
Assign it to your OpenClaw API key

This caps your worst-case at ~$90/month. Our expected spend is $18-27/month, so the $3/day limit gives plenty of headroom for busy days without allowing runaway costs.

⚠️ Warning

Without a budget guardrail, a single misconfigured batch job could burn $50+ in one night. We’ve seen reports of users hitting $200+ bills from automation loops that escalated to Opus. Set the guardrail before going live.

Monitoring Workflow

Check these regularly:

✔ Daily: Run /status to check context size and current model
✔ Weekly: Review OpenRouter dashboard for per-model cost breakdown
✔ Monthly: Screenshot dashboard, compare against estimates, adjust tiers if needed

Per-Agent Configuration

Instead of one model for everything, define specialized agents with pinned models and context limits:

Agent ID	Pinned Model	Context Limit	Purpose
`content-writer`	Claude Sonnet	80K tokens	Article writing, rewrites, content creation
`seo-analyst`	Kimi K2.5	50K tokens	SEO audits, keyword research, competitor analysis
`data-worker`	Gemini Flash	30K tokens	CSV processing, API calls, data extraction
Default (all others)	Gemini Flash	50K tokens	Everything else starts cheap

OPENCLAW.JSON — AGENT LIST

"list": [
  {
    "id": "content-writer",
    "model": "anthropic/claude-sonnet-4-5-20250514",
    "params": { "contextTokens": 80000 }
  },
  {
    "id": "seo-analyst",
    "model": "moonshot/kimi-k2.5",
    "params": { "contextTokens": 50000 }
  },
  {
    "id": "data-worker",
    "model": "google/gemini-2.0-flash-001",
    "params": { "contextTokens": 30000 }
  }
]

Context limits matter. A data worker processing CSVs doesn’t need 80K tokens of context. Capping it at 30K forces compaction earlier and keeps costs tight.

Compaction itself runs on Flash — don’t waste Sonnet tokens on mechanical text summarization.

OPENCLAW.JSON — COMPACTION

"compaction": {
  "model": "google/gemini-2.0-flash-001"
}

Running a Multi-Agent SEO Operation?

See how we wired OpenClaw + n8n + 10 Python scripts into a full AI SEO stack for $27/month.

Read: Why We Built a $27/mo AI SEO Operation →

Related: Building an SEO Audit Swarm with AI Agents shows how our seo-analyst and data-worker agents work together.

Common Token Optimization Mistakes

Mistake 1: Using Opus for Batch Jobs

Opus ($15/1M tokens) is a reasoning powerhouse. But if you’re processing 50 URLs, extracting titles, or running classification tasks — that’s Flash territory. We’ve seen batch jobs that should cost $0.15 cost $22 because the model wasn’t switched.

Fix: Pin batch and automation tasks to fast. Only escalate if the output quality is measurably bad.

Mistake 2: Timestamps in System Prompts

If your workspace files inject “Current Date and Time: March 2, 2026 14:30:05” into the system prompt, you’ve just invalidated your entire cache. Every call gets a unique prefix. Every call pays full price.

Fix: Keep workspace files 100% static. Let the model infer the date from conversation context, or inject it in the user message (not the system prompt).

Mistake 3: Never Compacting Sessions

OpenClaw sessions grow. A 50-turn conversation about SEO analysis can hit 100K+ tokens. Every subsequent message pays for all that context.

Fix: Run /compact after completing each task. Start /new sessions between unrelated tasks. Check /status regularly — if context exceeds 30K for a simple task, compact immediately.

Mistake 4: Loading Entire Files as Memory

Without QMD, OpenClaw dumps your entire MEMORY.md, all daily notes, and any referenced files directly into context. A 5,000-token memory file is loaded in full even when the conversation only needs one paragraph.

Fix: Install QMD. It returns only the 5 most relevant snippets instead of the entire file. 90% reduction in memory tokens.

Mistake 5: Not Setting a Budget Guardrail

“I’ll monitor it manually” works until it doesn’t. One unattended batch job at 3 AM can blow your monthly budget in a single night.

Fix: Set a $3/day guardrail on OpenRouter immediately. Takes 30 seconds. Prevents the $200 surprise bills that show up on forums regularly.

The Complete Cost Breakdown

Here’s what our operation actually costs with all optimizations applied:

Category	% of Tasks	Model	Monthly Cost
Heartbeats & idle checks	~15%	Gemini Flash	~$1.50
Data extraction & file ops	~40%	Gemini Flash	~$4.00
SEO analysis & browsing	~15%	Kimi K2.5	~$5.00
Content writing	~25%	Claude Sonnet	~$14.00
Architecture & debugging	~5%	Claude Opus	~$3.00
Total	100%	Mixed	~$27.50

MONTHLY COST COMPARISON

$27 vs $87

Same quality output. Same number of tasks. Just smarter routing.

The writing budget ($14/month on Sonnet) is non-negotiable. Content quality is what drives SEO rankings. You save everywhere else so you can afford to spend here.

“The teams that understand model routing will build 10x more with the same budget. It’s not about spending less — it’s about spending on the right tokens.”
— Matt Ganzak, OpenClaw Token Optimization Guide, 2026

Security Guardrails (Non-Negotiable)

Token optimization shouldn’t compromise security. These are the guardrails we never disable:

🔒 Heartbeat uses API model with prompt injection resistance (not local Ollama)
🔒 Gateway bound to 127.0.0.1 only — never exposed to the network
🔒 Token-based gateway authentication
🔒 Phone/user allowlist on messaging channel
🔒 HEARTBEAT.md kept empty — minimal attack surface during heartbeat cycles
🔒 Never use local models for tasks involving untrusted content (web scraping, email processing)

Related: Securing Your AI Agent in 2026: ClawHavoc & CVE-2026-25253 covers the ClawHavoc supply chain attack (1,184 malicious skills), the WebSocket RCE, and how we hardened against them.

Frequently Asked Questions

Is OpenClaw free to run?

OpenClaw itself is free and open-source. The cost comes from the AI models it calls through APIs like OpenRouter. With our optimized config, expect $18-27/month for a production SEO operation. A minimal personal assistant setup can run under $5/month on Gemini Flash alone.

Can I use Ollama to make it completely free?

Technically yes, but we don’t recommend it for production. Local models lack the prompt injection hardening of API models. For a personal hobby project with no sensitive data, Ollama is fine. For a business operation handling credentials, emails, and financial data — use API models with training-level security. Gemini Flash at $0.10/1M tokens is nearly free anyway.

How much does Claude Opus cost on OpenClaw?

About $15 per million tokens. In our setup, Opus handles roughly 5% of tasks (architecture decisions, complex debugging, security audits), costing about $3/month. The key is never letting Opus touch routine tasks. A single batch job accidentally routed to Opus can cost more than your entire month of Flash usage.

Does prompt caching work with OpenRouter?

Yes, with caveats. Set cacheRetention: "long" for a 1-hour TTL. Cached reads get a 90% discount on Anthropic models. However, some OpenRouter provider pass-throughs don’t forward cache_control headers properly (see GitHub Issue #9600). Verify by checking that cacheRead > 0 after multiple turns in the same session.

What’s the minimum setup for token optimization?

Three changes that take five minutes:

Set "model": "google/gemini-2.0-flash-001" as default in openclaw.json
Set "heartbeat.model": "google/gemini-2.0-flash-001"
Set a $3/day budget guardrail on OpenRouter

That alone cuts 50-60% off most users’ bills. Add caching, QMD, and per-agent configs later for the remaining savings.

How do I check if my optimizations are working?

Run /status in any OpenClaw conversation. It shows your current model, context size, and token usage. Then check the OpenRouter dashboard for per-model spending breakdown. After 24 hours, verify: heartbeats hit Flash (not Sonnet/Opus), writing tasks hit Sonnet, and daily spend stays under $3.

Getting Started: Your Next Steps

☑ Quick-Start Checklist

☐ Set default model to Gemini Flash in openclaw.json
☐ Pin heartbeat to Gemini Flash at 55-minute interval
☐ Set cacheRetention: "long"
☐ Set $3/day budget guardrail on OpenRouter
☐ Trim workspace files (remove template boilerplate)
☐ Define per-agent models and context limits
☐ Install QMD in WSL2 for local memory search
☐ Run /status and verify after 24 hours

Here’s where to go based on your situation:

🚀 Just getting started? Apply the 3 quick fixes from the FAQ above. Takes 5 minutes, saves 50%.
🚀 Want to go deeper? Read How I Cut Costs by 70% with Model Routing for the full routing breakdown.
🚀 Building an SEO operation? See Why We Built a $27/mo AI SEO Operation for the complete stack.
🚀 Concerned about security? Start with our security hardening guide before optimizing for cost.

🔎 Key Takeaways

Model routing is the biggest lever — switching defaults from auto to Gemini Flash cuts 50-60% immediately
Heartbeats should use the cheapest API model — not Ollama (security risk) and not your default writer model
Prompt caching gives 90% discounts — but only if your system prompts are 100% static
QMD reduces memory tokens by 90% — install it once your memory files grow past 2,000 tokens
Budget guardrails are non-negotiable — $3/day on OpenRouter prevents surprise bills
Our real result: $87/month down to $27/month — same output quality, smarter routing

Explore our complete AI Automation & Workflows hub for more guides on building production AI agent systems.

OpenClaw Token Optimization: The Complete 2026 Guide

What Is Token Optimization (And Why Your OpenClaw Bill Is Too High)

The 5-Tier Model Routing System

The Routing Commandments

Want the Full Config File?

Heartbeat Configuration

Our Config

Why Not Ollama (Free)?

Prompt Caching and Context Management

Enable Long Cache

What Destroys Caching

Trim Your Workspace Files

Session Hygiene Commands

QMD Local Search

Quick Setup

Budget Controls

OpenRouter Daily Limit

Monitoring Workflow

Per-Agent Configuration

Running a Multi-Agent SEO Operation?

Common Token Optimization Mistakes

Mistake 1: Using Opus for Batch Jobs

Mistake 2: Timestamps in System Prompts

Mistake 3: Never Compacting Sessions

Mistake 4: Loading Entire Files as Memory

Mistake 5: Not Setting a Budget Guardrail

The Complete Cost Breakdown

Security Guardrails (Non-Negotiable)

Frequently Asked Questions

Is OpenClaw free to run?

Can I use Ollama to make it completely free?

How much does Claude Opus cost on OpenClaw?

Does prompt caching work with OpenRouter?

What’s the minimum setup for token optimization?

How do I check if my optimizations are working?

Getting Started: Your Next Steps

☑ Quick-Start Checklist

🔎 Key Takeaways

About The Author

DesignCopy

Related Posts

AI SEO Analytics Guide: Transform Your Data Strategy

AI Site Audit Tools Comparison: Semrush vs Ahrefs [2026]

Physical AI, Robotics & Digital Twins — Complete Hub

How to Get Cited by AI LLMs: Complete SEO Guide 2026

Recent Posts

Search

You are successfully subscribed!