{"id":261986,"date":"2026-03-02T16:34:49","date_gmt":"2026-03-02T07:34:49","guid":{"rendered":"https:\/\/designcopy.net\/en\/?p=261986"},"modified":"2026-07-07T09:29:44","modified_gmt":"2026-07-07T00:29:44","slug":"openclaw-token-optimization-guide","status":"publish","type":"post","link":"https:\/\/designcopy.net\/ko\/openclaw-token-optimization-guide\/","title":{"rendered":"OpenClaw Token Optimization: The Complete 2026 Guide"},"content":{"rendered":"<p>We ran OpenClaw at its defaults for three weeks. The bill? $87 in a single month. Most of it was wasted on a frontier model doing simple file reads.<\/p>\n<p>This guide breaks down every optimization we applied to get that number down to <strong>$27\/month<\/strong> \u2014 without losing quality on the tasks that matter. You&#8217;ll get the exact configs, the real cost math, and the security tradeoffs we weighed.<\/p>\n<p>After running this stack for a production SEO operation with 500+ planned posts, here&#8217;s what actually moves the needle.<\/p>\n<p>> <strong>Quick Navigation<\/strong>: <a href=\"#what-is-token-optimization\">What Is Token Optimization<\/a> | <a href=\"#the-5-tier-model-routing-system\">5-Tier Model Routing<\/a> | <a href=\"#heartbeat-configuration\">Heartbeat Config<\/a> | <a href=\"#prompt-caching-and-context-management\">Prompt Caching<\/a> | <a href=\"#qmd-local-search\">QMD Local Search<\/a> | <a href=\"#budget-controls\">Budget Controls<\/a> | <a href=\"#common-mistakes\">Common Mistakes<\/a> | <a href=\"#faq\">FAQ<\/a><\/p>\n<hr>\n<h2>What Is Token Optimization (And Why Your OpenClaw Bill Is Too High)<\/h2>\n<p>Token optimization means spending the <strong>least amount of money per task<\/strong> without degrading output quality. In OpenClaw, every message you send, every heartbeat check, every sub-agent call burns tokens. And tokens cost money.<\/p>\n<p>The problem is straightforward. OpenClaw&#8217;s default config uses <code>openrouter\/auto<\/code>, which auto-selects models based on availability \u2014 not cost. That means your heartbeat (a simple &#8220;are you alive?&#8221; check that runs every hour) might hit Claude Opus at $15 per million tokens instead of Gemini Flash at $0.10.<\/p>\n<div style=\"background: #ecfdf5; border: 2px solid #10b981; border-radius: 12px; padding: 20px 24px; margin: 24px 0; text-align: center;\">\n<p style=\"margin: 0; font-size: 14px; color: #059669; font-weight: 600;\">OUR MEASURED RESULT<\/p>\n<p style=\"margin: 8px 0 0 0; font-size: 36px; font-weight: bold; color: #047857;\">$87 \u2192 $27\/mo<\/p>\n<p style=\"margin: 4px 0 0 0; font-size: 14px; color: #6b7280;\">70% cost reduction with zero quality loss on writing tasks<\/p>\n<\/div>\n<p>Here&#8217;s what eats your budget:<\/p>\n<ul>\n<li><strong>Heartbeats:<\/strong> Run every 55-60 minutes, 24\/7. If routed to an expensive model, that&#8217;s $15-30\/month doing nothing.<\/li>\n<li><strong>Context bloat:<\/strong> Workspace files loaded every session. A 420-line AGENTS.md wastes tokens before you even ask a question.<\/li>\n<li><strong>Wrong model for the job:<\/strong> Using Opus for CSV parsing is like hiring a lawyer to sort your mail.<\/li>\n<li><strong>Cache misses:<\/strong> Dynamic content in system prompts (timestamps, dates) destroys caching and costs 10x more.<\/li>\n<\/ul>\n<p><strong>Key takeaway<\/strong>: The single biggest lever is model routing. Fix that first and you&#8217;ll cut 50-60% immediately.<\/p>\n<p><strong>Related<\/strong>: <a href=\"\/ai-automation\/model-routing-cost-reduction\/\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">How I Cut My AI Agent Costs by 70% with Smart Model Routing<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a> goes deep on routing alone.<\/p>\n<hr>\n<h2>The 5-Tier Model Routing System<\/h2>\n<p>This is the config that changed everything. Instead of one model for all tasks, we pin <strong>five tiers<\/strong> matched to task complexity.<\/p>\n<div style=\"overflow-x:auto; margin:24px 0; border-radius:8px; border:1px solid #e2e8f0;\">\n<table style=\"width:100%; border-collapse:collapse; font-size:15px; line-height:1.6;\">\n<thead>\n<tr>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Tier<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Model<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Alias<\/th>\n<th style=\"text-align:right; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Cost per 1M Tokens<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Use For<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\"><strong>Budget<\/strong><\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Gemini 2.0 Flash<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>fast<\/code><\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">$0.10<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Heartbeats, classification, file ops, data extraction<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><strong>Worker<\/strong><\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Kimi K2.5<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>kimi<\/code><\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">$0.60<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">SEO analysis, agentic browsing, multi-step reasoning<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\"><strong>Writer<\/strong><\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Claude Sonnet 4.5<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>sonnet<\/code><\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">$3.00<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">All content writing \u2014 articles, blog posts, long-form<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><strong>Quality<\/strong><\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Claude Sonnet \/ GPT<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>sonnet<\/code><\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">$3.00<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Complex reasoning, code architecture, security audits<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\"><strong>Frontier<\/strong><\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Claude Opus 4.6<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>opus<\/code><\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">$15.00<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Only when explicitly requested via <code>\/model opus<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>The math is simple. If 75% of your tasks run on Flash ($0.10) instead of auto-routed to Sonnet ($3.00), you save <strong>96% on those tasks<\/strong>.<\/p>\n<div style=\"background: #f0f9ff; border-left: 4px solid #0ea5e9; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #0369a1;\">&#128161; Pro Tip<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">When a user says &#8220;use the best model,&#8221; that means Sonnet \u2014 not Opus. Opus is only for tasks where the user explicitly types <code>\/model opus<\/code>. This one rule prevents most accidental overspend.<\/p>\n<\/div>\n<p>Here&#8217;s the models section from our actual <code>openclaw.json<\/code>:<\/p>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">OPENCLAW.JSON \u2014 MODELS CONFIG<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">{\n  \"agents\": {\n    \"defaults\": {\n      \"model\": \"google\/gemini-2.0-flash-001\",\n      \"models\": {\n        \"fast\":   { \"id\": \"google\/gemini-2.0-flash-001\" },\n        \"kimi\":   { \"id\": \"moonshot\/kimi-k2.5\" },\n        \"sonnet\": { \"id\": \"anthropic\/claude-sonnet-4-5-20250514\" },\n        \"opus\":   { \"id\": \"anthropic\/claude-opus-4-6\" },\n        \"sonar\":  { \"id\": \"perplexity\/sonar-pro\" }\n      }\n    }\n  }\n}<\/pre>\n<\/div>\n<p>Notice the default is <code>gemini-2.0-flash-001<\/code>. Every task starts cheap. You <strong>escalate<\/strong> to a better model only when the task genuinely demands it.<\/p>\n<h3>The Routing Commandments<\/h3>\n<p>These rules are baked into our agent instructions:<\/p>\n<ol>\n<li>Heartbeats <strong>always<\/strong> use <code>fast<\/code>. Configured in openclaw.json. Never override.<\/li>\n<li>Sub-agents default to <code>fast<\/code>. Only escalate if the task needs real reasoning.<\/li>\n<li>Never use <code>opus<\/code> for automation, cron jobs, or batch processing.<\/li>\n<li>All content writing must use <code>sonnet<\/code>. Switch with <code>\/model sonnet<\/code> before writing.<\/li>\n<li>SEO analysis uses <code>kimi<\/code>. Kimi K2.5 excels at agentic browsing tasks.<\/li>\n<li>Batch operations: 10 items per prompt, not 10 separate prompts. Saves 40%.<\/li>\n<\/ol>\n<div style=\"background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 12px; padding: 24px 32px; margin: 32px 0; color: white; text-align: center;\">\n<h3 style=\"color: white; margin-top: 0; font-size: 22px;\">Want the Full Config File?<\/h3>\n<p style=\"color: rgba(255,255,255,0.9); font-size: 16px;\">We&#8217;ve open-sourced our complete openclaw.json with all 5 model tiers, agent configs, and caching settings. Grab it from our GitHub repo.<\/p>\n<\/div>\n<p><strong>Related<\/strong>: <a href=\"\/ai-seo\/ai-model-showdown-seo\/\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">AI Model Showdown for SEO: Gemini Flash vs Sonnet vs Kimi K2.5<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a> compares each model&#8217;s quality for SEO-specific tasks.<\/p>\n<hr>\n<h2>Heartbeat Configuration<\/h2>\n<p>The heartbeat is OpenClaw&#8217;s &#8220;are you still there?&#8221; check. It runs continuously \u2014 typically every 55-60 minutes. If you get this wrong, you&#8217;re burning money around the clock.<\/p>\n<h3>Our Config<\/h3>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">OPENCLAW.JSON \u2014 HEARTBEAT<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">\"heartbeat\": {\n  \"model\": \"google\/gemini-2.0-flash-001\",\n  \"interval\": 55,\n  \"directPolicy\": \"allow\"\n}<\/pre>\n<\/div>\n<p>Two decisions matter here:<\/p>\n<ul style=\"list-style: none; padding-left: 0;\">\n<li style=\"padding: 4px 0;\">&#10148; <strong>Model: Gemini Flash<\/strong> \u2014 costs about $1.50\/month for 24\/7 heartbeats. Opus would cost $15-30\/month for the same checks.<\/li>\n<li style=\"padding: 4px 0;\">&#10148; <strong>Interval: 55 minutes<\/strong> \u2014 aligns with our 1-hour prompt cache TTL. The heartbeat keeps the cache warm so your next real conversation doesn&#8217;t pay full price.<\/li>\n<\/ul>\n<h3>Why Not Ollama (Free)?<\/h3>\n<p>Some guides recommend running heartbeats on Ollama, a free local model. We tried it. Don&#8217;t.<\/p>\n<div style=\"background: #fef2f2; border-left: 4px solid #ef4444; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #dc2626;\">&#9888;&#65039; Warning \u2014 The &#8220;3 AM Vulnerability&#8221;<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">Local models like Ollama lack the prompt injection hardening that frontier API models receive during training. The heartbeat runs 24\/7 \u2014 including at 3 AM when you&#8217;re asleep. If it processes a compromised email or webpage, a local model is far more likely to follow malicious instructions. The $1.50\/month for Gemini Flash buys you training-level injection resistance.<\/p>\n<\/div>\n<p>This isn&#8217;t theoretical. Security researchers at Palo Alto Networks have documented prompt injection attacks against AI agents. The heartbeat is a particularly attractive target because it runs unattended.<\/p>\n<p><strong>Related<\/strong>: <a href=\"\/ai-automation\/openclaw-security-clawhavoc\/\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">Securing Your AI Agent: ClawHavoc, CVE-2026-25253 &#038; How We Hardened<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a> covers the full security picture.<\/p>\n<hr>\n<h2>Prompt Caching and Context Management<\/h2>\n<p>Prompt caching gives you a <strong>90% discount<\/strong> on tokens the model has already seen. On Anthropic models through OpenRouter, cached reads cost 10% of normal. But one wrong config destroys it.<\/p>\n<h3>Enable Long Cache<\/h3>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">OPENCLAW.JSON \u2014 CACHING<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">\"params\": {\n  \"cacheRetention\": \"long\",\n  \"contextTokens\": 50000\n}<\/pre>\n<\/div>\n<p>Setting <code>cacheRetention<\/code> to <code>\"long\"<\/code> gives you a 1-hour TTL. Combined with our 55-minute heartbeat interval, the cache stays warm continuously.<\/p>\n<h3>What Destroys Caching<\/h3>\n<p>Three things will kill your cache hit rate:<\/p>\n<div style=\"background: #fef2f2; border-left: 4px solid #ef4444; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #dc2626;\">&#9888;&#65039; Cache Killers<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\"><strong>1. Dynamic timestamps in system prompts.<\/strong> If your workspace files inject &#8220;Current Date: March 2, 2026&#8221; into the system prompt, every single call has a different prefix. Cache miss every time.<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\"><strong>2. Changing SOUL.md or AGENTS.md mid-session.<\/strong> These files form the system prompt. Edit them = invalidate the cache.<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\"><strong>3. OpenRouter provider pass-through issues.<\/strong> Some providers don&#8217;t forward <code>cache_control<\/code> headers. Check Issue #9600 if you suspect cache isn&#8217;t working.<\/p>\n<\/div>\n<h3>Trim Your Workspace Files<\/h3>\n<p>Every file loaded at session startup eats tokens. We reduced our session load from <strong>420 lines to 158 lines<\/strong> \u2014 a 62% reduction.<\/p>\n<p>How we did it:<\/p>\n<div style=\"overflow-x:auto; margin:24px 0; border-radius:8px; border:1px solid #e2e8f0;\">\n<table style=\"width:100%; border-collapse:collapse; font-size:15px; line-height:1.6;\">\n<thead>\n<tr>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">File<\/th>\n<th style=\"text-align:right; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Before<\/th>\n<th style=\"text-align:right; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">After<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">What Changed<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">AGENTS.md<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">264 lines<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">95 lines<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Moved group chat rules, heartbeat guide, project context to separate on-demand files<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">IDENTITY.md<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">24 lines<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">5 lines<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Removed template boilerplate, filled in actual values<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">USER.md<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">18 lines<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">6 lines<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Same \u2014 removed template, added real info<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">TOOLS.md<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">41 lines<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">11 lines<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Stripped examples, kept only our actual tools<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">BOOTSTRAP.md<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">56 lines<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Deleted<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">First-run file, docs say delete after setup<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div style=\"background: #f0f9ff; border-left: 4px solid #0ea5e9; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #0369a1;\">&#128161; Pro Tip<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">Keep static files (SOUL.md, IDENTITY.md) separate from dynamic files (daily memory notes). Static files cache perfectly. Dynamic files should load last so they don&#8217;t invalidate the cache prefix of everything before them.<\/p>\n<\/div>\n<h3>Session Hygiene Commands<\/h3>\n<p>These commands are your daily tools for controlling context size:<\/p>\n<div style=\"background: #fefce8; border: 2px solid #facc15; border-radius: 12px; padding: 20px 24px; margin: 24px 0;\">\n<p style=\"margin: 0 0 8px 0; font-weight: 600; color: #854d0e;\">&#128221; OpenClaw Commands<\/p>\n<pre style=\"margin: 0; background: #fffbeb; padding: 12px; border-radius: 6px; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.5; white-space: pre-wrap; color: #422006;\">\/compact   \u2014 Compress context when it grows past 30K tokens\n\/new       \u2014 Start fresh session after completing a task\n\/status    \u2014 Check context size, model, and token usage\n\/model X   \u2014 Switch to a specific model tier (e.g., \/model sonnet)<\/pre>\n<\/div>\n<p><strong>Rule of thumb<\/strong>: Run <code>\/compact<\/code> after every major task. Start <code>\/new<\/code> sessions rather than letting context bloat. Check <code>\/status<\/code> before writing \u2014 make sure you&#8217;re on the right model.<\/p>\n<hr>\n<h2>QMD Local Search<\/h2>\n<p>QMD (Query Markup Documents) is a local search engine by Tobi Lutke. It uses BM25 + vector search + LLM reranking to find relevant content from your knowledge base \u2014 and only injects the relevant snippets, not entire files.<\/p>\n<div style=\"background: #ecfdf5; border: 2px solid #10b981; border-radius: 12px; padding: 20px 24px; margin: 24px 0; text-align: center;\">\n<p style=\"margin: 0; font-size: 14px; color: #059669; font-weight: 600;\">TOKEN REDUCTION<\/p>\n<p style=\"margin: 8px 0 0 0; font-size: 36px; font-weight: bold; color: #047857;\">90%<\/p>\n<p style=\"margin: 4px 0 0 0; font-size: 14px; color: #6b7280;\">Fewer memory tokens injected per session with QMD vs full file loading<\/p>\n<\/div>\n<h3>Quick Setup<\/h3>\n<p>QMD requires WSL2 on Windows. It does <strong>not<\/strong> work on native Windows (missing sqlite-vec binary, tsx module errors).<\/p>\n<ol>\n<li>Install build tools: <code>sudo apt-get install -y build-essential<\/code><\/li>\n<li>Install QMD: <code>npm install -g @tobilu\/qmd<\/code><\/li>\n<li>Verify: <code>qmd --version<\/code> (should show 1.0.7+)<\/li>\n<li>First run auto-downloads ~2GB of GGUF models (one-time)<\/li>\n<\/ol>\n<p>Then add this to your <code>openclaw.json<\/code>:<\/p>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">OPENCLAW.JSON \u2014 QMD MEMORY BACKEND<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">\"memory\": {\n  \"backend\": \"qmd\",\n  \"qmd\": {\n    \"searchMode\": \"hybrid\",\n    \"includeDefaultMemory\": true,\n    \"paths\": [\"~\/openclaw-workspace\/memory\"],\n    \"updateInterval\": 300,\n    \"maxResults\": 5\n  }\n}<\/pre>\n<\/div>\n<div style=\"background: #f0f9ff; border-left: 4px solid #0ea5e9; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #0369a1;\">&#128161; Pro Tip<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">QMD runs 100% locally. No API calls, no data leaves your machine. Search latency is about 47ms per lookup. Install it once your memory files exceed ~2,000 tokens total \u2014 before that, full file loading is fine.<\/p>\n<\/div>\n<p><strong>Related<\/strong>: <a href=\"\/ai-automation\/qmd-local-search-setup\/\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">Setting Up QMD for Local AI Search: Installation &#038; Real Results<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a> covers the full walkthrough including the WSL2 gotchas we hit.<\/p>\n<hr>\n<h2>Budget Controls<\/h2>\n<p>Even with perfect routing, mistakes happen. A runaway loop, a forgotten <code>\/model opus<\/code> switch, or a sub-agent that escalates unexpectedly. Budget guardrails are your safety net.<\/p>\n<h3>OpenRouter Daily Limit<\/h3>\n<ol>\n<li>Go to <a href=\"https:\/\/openrouter.ai\/settings\/limits\" rel=\"nofollow noopener external noreferrer\" target=\"_blank\" data-wpel-link=\"external\">openrouter.ai\/settings\/limits<\/a><\/li>\n<li>Create a guardrail: <strong>$3\/day<\/strong><\/li>\n<li>Assign it to your OpenClaw API key<\/li>\n<\/ol>\n<p>This caps your worst-case at ~$90\/month. Our expected spend is $18-27\/month, so the $3\/day limit gives plenty of headroom for busy days without allowing runaway costs.<\/p>\n<div style=\"background: #fef2f2; border-left: 4px solid #ef4444; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #dc2626;\">&#9888;&#65039; Warning<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">Without a budget guardrail, a single misconfigured batch job could burn $50+ in one night. We&#8217;ve seen reports of users hitting $200+ bills from automation loops that escalated to Opus. Set the guardrail before going live.<\/p>\n<\/div>\n<h3>Monitoring Workflow<\/h3>\n<p>Check these regularly:<\/p>\n<ul style=\"list-style: none; padding-left: 0;\">\n<li style=\"padding: 4px 0;\">&#10004; <strong>Daily:<\/strong> Run <code>\/status<\/code> to check context size and current model<\/li>\n<li style=\"padding: 4px 0;\">&#10004; <strong>Weekly:<\/strong> Review OpenRouter dashboard for per-model cost breakdown<\/li>\n<li style=\"padding: 4px 0;\">&#10004; <strong>Monthly:<\/strong> Screenshot dashboard, compare against estimates, adjust tiers if needed<\/li>\n<\/ul>\n<hr>\n<h2>Per-Agent Configuration<\/h2>\n<p>Instead of one model for everything, define specialized agents with <strong>pinned models and context limits<\/strong>:<\/p>\n<div style=\"overflow-x:auto; margin:24px 0; border-radius:8px; border:1px solid #e2e8f0;\">\n<table style=\"width:100%; border-collapse:collapse; font-size:15px; line-height:1.6;\">\n<thead>\n<tr>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Agent ID<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Pinned Model<\/th>\n<th style=\"text-align:center; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Context Limit<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Purpose<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>content-writer<\/code><\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Claude Sonnet<\/td>\n<td style=\"text-align:center; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">80K tokens<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Article writing, rewrites, content creation<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>seo-analyst<\/code><\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Kimi K2.5<\/td>\n<td style=\"text-align:center; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">50K tokens<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">SEO audits, keyword research, competitor analysis<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>data-worker<\/code><\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Gemini Flash<\/td>\n<td style=\"text-align:center; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">30K tokens<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">CSV processing, API calls, data extraction<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Default (all others)<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Gemini Flash<\/td>\n<td style=\"text-align:center; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">50K tokens<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Everything else starts cheap<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">OPENCLAW.JSON \u2014 AGENT LIST<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">\"list\": [\n  {\n    \"id\": \"content-writer\",\n    \"model\": \"anthropic\/claude-sonnet-4-5-20250514\",\n    \"params\": { \"contextTokens\": 80000 }\n  },\n  {\n    \"id\": \"seo-analyst\",\n    \"model\": \"moonshot\/kimi-k2.5\",\n    \"params\": { \"contextTokens\": 50000 }\n  },\n  {\n    \"id\": \"data-worker\",\n    \"model\": \"google\/gemini-2.0-flash-001\",\n    \"params\": { \"contextTokens\": 30000 }\n  }\n]<\/pre>\n<\/div>\n<p>Context limits matter. A data worker processing CSVs doesn&#8217;t need 80K tokens of context. Capping it at 30K forces compaction earlier and keeps costs tight.<\/p>\n<p>Compaction itself runs on Flash \u2014 don&#8217;t waste Sonnet tokens on mechanical text summarization.<\/p>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">OPENCLAW.JSON \u2014 COMPACTION<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">\"compaction\": {\n  \"model\": \"google\/gemini-2.0-flash-001\"\n}<\/pre>\n<\/div>\n<div style=\"background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 12px; padding: 24px 32px; margin: 32px 0; color: white; text-align: center;\">\n<h3 style=\"color: white; margin-top: 0; font-size: 22px;\">Running a Multi-Agent SEO Operation?<\/h3>\n<p style=\"color: rgba(255,255,255,0.9); font-size: 16px;\">See how we wired OpenClaw + n8n + 10 Python scripts into a full AI SEO stack for $27\/month.<\/p>\n<p style=\"margin-top: 12px;\"><a href=\"\/ai-seo\/ai-seo-operation-full-stack\/\" style=\"color: #fbbf24; text-decoration: underline;\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">Read: Why We Built a $27\/mo AI SEO Operation \u2192<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a><\/p>\n<\/div>\n<p><strong>Related<\/strong>: <a href=\"\/ai-seo\/seo-audit-swarm-ai-agents\/\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">Building an SEO Audit Swarm with AI Agents<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a> shows how our <code>seo-analyst<\/code> and <code>data-worker<\/code> agents work together.<\/p>\n<hr>\n<h2>Common Token Optimization Mistakes<\/h2>\n<h3>Mistake 1: Using Opus for Batch Jobs<\/h3>\n<p>Opus ($15\/1M tokens) is a reasoning powerhouse. But if you&#8217;re processing 50 URLs, extracting titles, or running classification tasks \u2014 that&#8217;s Flash territory. We&#8217;ve seen batch jobs that should cost $0.15 cost $22 because the model wasn&#8217;t switched.<\/p>\n<p><strong>Fix<\/strong>: Pin batch and automation tasks to <code>fast<\/code>. Only escalate if the output quality is measurably bad.<\/p>\n<h3>Mistake 2: Timestamps in System Prompts<\/h3>\n<p>If your workspace files inject &#8220;Current Date and Time: March 2, 2026 14:30:05&#8221; into the system prompt, you&#8217;ve just invalidated your entire cache. Every call gets a unique prefix. Every call pays full price.<\/p>\n<p><strong>Fix<\/strong>: Keep workspace files 100% static. Let the model infer the date from conversation context, or inject it in the user message (not the system prompt).<\/p>\n<h3>Mistake 3: Never Compacting Sessions<\/h3>\n<p>OpenClaw sessions grow. A 50-turn conversation about SEO analysis can hit 100K+ tokens. Every subsequent message pays for all that context.<\/p>\n<p><strong>Fix<\/strong>: Run <code>\/compact<\/code> after completing each task. Start <code>\/new<\/code> sessions between unrelated tasks. Check <code>\/status<\/code> regularly \u2014 if context exceeds 30K for a simple task, compact immediately.<\/p>\n<h3>Mistake 4: Loading Entire Files as Memory<\/h3>\n<p>Without QMD, OpenClaw dumps your entire MEMORY.md, all daily notes, and any referenced files directly into context. A 5,000-token memory file is loaded in full even when the conversation only needs one paragraph.<\/p>\n<p><strong>Fix<\/strong>: Install QMD. It returns only the 5 most relevant snippets instead of the entire file. 90% reduction in memory tokens.<\/p>\n<h3>Mistake 5: Not Setting a Budget Guardrail<\/h3>\n<p>&#8220;I&#8217;ll monitor it manually&#8221; works until it doesn&#8217;t. One unattended batch job at 3 AM can blow your monthly budget in a single night.<\/p>\n<p><strong>Fix<\/strong>: Set a $3\/day guardrail on OpenRouter immediately. Takes 30 seconds. Prevents the $200 surprise bills that show up on forums regularly.<\/p>\n<hr>\n<h2>The Complete Cost Breakdown<\/h2>\n<p>Here&#8217;s what our operation actually costs with all optimizations applied:<\/p>\n<div style=\"overflow-x:auto; margin:24px 0; border-radius:8px; border:1px solid #e2e8f0;\">\n<table style=\"width:100%; border-collapse:collapse; font-size:15px; line-height:1.6;\">\n<thead>\n<tr>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Category<\/th>\n<th style=\"text-align:center; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">% of Tasks<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Model<\/th>\n<th style=\"text-align:right; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Monthly Cost<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Heartbeats &#038; idle checks<\/td>\n<td style=\"text-align:center; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">~15%<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Gemini Flash<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">~$1.50<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Data extraction &#038; file ops<\/td>\n<td style=\"text-align:center; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">~40%<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Gemini Flash<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">~$4.00<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">SEO analysis &#038; browsing<\/td>\n<td style=\"text-align:center; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">~15%<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Kimi K2.5<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">~$5.00<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Content writing<\/td>\n<td style=\"text-align:center; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">~25%<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Claude Sonnet<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">~$14.00<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Architecture &#038; debugging<\/td>\n<td style=\"text-align:center; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">~5%<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Claude Opus<\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">~$3.00<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><strong>Total<\/strong><\/td>\n<td style=\"text-align:center; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><strong>100%<\/strong><\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><strong>Mixed<\/strong><\/td>\n<td style=\"text-align:right; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><strong>~$27.50<\/strong><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div style=\"background: #ecfdf5; border: 2px solid #10b981; border-radius: 12px; padding: 20px 24px; margin: 24px 0; text-align: center;\">\n<p style=\"margin: 0; font-size: 14px; color: #059669; font-weight: 600;\">MONTHLY COST COMPARISON<\/p>\n<p style=\"margin: 8px 0 0 0; font-size: 36px; font-weight: bold; color: #047857;\">$27 vs $87<\/p>\n<p style=\"margin: 4px 0 0 0; font-size: 14px; color: #6b7280;\">Same quality output. Same number of tasks. Just smarter routing.<\/p>\n<\/div>\n<p>The writing budget ($14\/month on Sonnet) is non-negotiable. Content quality is what drives SEO rankings. You save everywhere else so you can afford to spend here.<\/p>\n<blockquote style=\"border-left: 4px solid #6366f1; background: #eef2ff; padding: 20px 24px; margin: 24px 0; border-radius: 0 8px 8px 0;\">\n<p style=\"margin: 0; font-style: italic; color: #312e81; font-size: 16px; line-height: 1.6;\">&#8220;The teams that understand model routing will build 10x more with the same budget. It&#8217;s not about spending less \u2014 it&#8217;s about spending on the right tokens.&#8221;<\/p>\n<p style=\"margin: 12px 0 0 0; font-size: 14px; color: #4338ca; font-weight: 600;\">\u2014 Matt Ganzak, OpenClaw Token Optimization Guide, 2026<\/p>\n<\/blockquote>\n<hr>\n<h2>Security Guardrails (Non-Negotiable)<\/h2>\n<p>Token optimization shouldn&#8217;t compromise security. These are the guardrails we never disable:<\/p>\n<ul style=\"list-style: none; padding-left: 0;\">\n<li style=\"padding: 6px 0;\">&#128274; Heartbeat uses API model with prompt injection resistance (not local Ollama)<\/li>\n<li style=\"padding: 6px 0;\">&#128274; Gateway bound to <code>127.0.0.1<\/code> only \u2014 never exposed to the network<\/li>\n<li style=\"padding: 6px 0;\">&#128274; Token-based gateway authentication<\/li>\n<li style=\"padding: 6px 0;\">&#128274; Phone\/user allowlist on messaging channel<\/li>\n<li style=\"padding: 6px 0;\">&#128274; HEARTBEAT.md kept empty \u2014 minimal attack surface during heartbeat cycles<\/li>\n<li style=\"padding: 6px 0;\">&#128274; Never use local models for tasks involving untrusted content (web scraping, email processing)<\/li>\n<\/ul>\n<p><strong>Related<\/strong>: <a href=\"\/ai-automation\/openclaw-security-clawhavoc\/\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">Securing Your AI Agent in 2026: ClawHavoc &#038; CVE-2026-25253<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a> covers the ClawHavoc supply chain attack (1,184 malicious skills), the WebSocket RCE, and how we hardened against them.<\/p>\n<hr>\n<h2>Frequently Asked Questions<\/h2>\n<h3>Is OpenClaw free to run?<\/h3>\n<p><strong>OpenClaw itself is free and open-source.<\/strong> The cost comes from the AI models it calls through APIs like OpenRouter. With our optimized config, expect $18-27\/month for a production SEO operation. A minimal personal assistant setup can run under $5\/month on Gemini Flash alone.<\/p>\n<h3>Can I use Ollama to make it completely free?<\/h3>\n<p><strong>Technically yes, but we don&#8217;t recommend it for production.<\/strong> Local models lack the prompt injection hardening of API models. For a personal hobby project with no sensitive data, Ollama is fine. For a business operation handling credentials, emails, and financial data \u2014 use API models with training-level security. Gemini Flash at $0.10\/1M tokens is nearly free anyway.<\/p>\n<h3>How much does Claude Opus cost on OpenClaw?<\/h3>\n<p><strong>About $15 per million tokens.<\/strong> In our setup, Opus handles roughly 5% of tasks (architecture decisions, complex debugging, security audits), costing about $3\/month. The key is never letting Opus touch routine tasks. A single batch job accidentally routed to Opus can cost more than your entire month of Flash usage.<\/p>\n<h3>Does prompt caching work with OpenRouter?<\/h3>\n<p><strong>Yes, with caveats.<\/strong> Set <code>cacheRetention: \"long\"<\/code> for a 1-hour TTL. Cached reads get a 90% discount on Anthropic models. However, some OpenRouter provider pass-throughs don&#8217;t forward <code>cache_control<\/code> headers properly (see GitHub Issue #9600). Verify by checking that <code>cacheRead > 0<\/code> after multiple turns in the same session.<\/p>\n<h3>What&#8217;s the minimum setup for token optimization?<\/h3>\n<p><strong>Three changes that take five minutes:<\/strong><\/p>\n<ol>\n<li>Set <code>\"model\": \"google\/gemini-2.0-flash-001\"<\/code> as default in openclaw.json<\/li>\n<li>Set <code>\"heartbeat.model\": \"google\/gemini-2.0-flash-001\"<\/code><\/li>\n<li>Set a $3\/day budget guardrail on OpenRouter<\/li>\n<\/ol>\n<p>That alone cuts 50-60% off most users&#8217; bills. Add caching, QMD, and per-agent configs later for the remaining savings.<\/p>\n<h3>How do I check if my optimizations are working?<\/h3>\n<p><strong>Run <code>\/status<\/code> in any OpenClaw conversation.<\/strong> It shows your current model, context size, and token usage. Then check the OpenRouter dashboard for per-model spending breakdown. After 24 hours, verify: heartbeats hit Flash (not Sonnet\/Opus), writing tasks hit Sonnet, and daily spend stays under $3.<\/p>\n<hr>\n<h2>Getting Started: Your Next Steps<\/h2>\n<div style=\"background: #fffbeb; border: 2px solid #f59e0b; border-radius: 12px; padding: 24px; margin: 32px 0;\">\n<h3 style=\"margin-top: 0; color: #92400e;\">&#9745; Quick-Start Checklist<\/h3>\n<ul style=\"list-style: none; padding-left: 0;\">\n<li style=\"padding: 6px 0;\">&#9744; Set default model to Gemini Flash in openclaw.json<\/li>\n<li style=\"padding: 6px 0;\">&#9744; Pin heartbeat to Gemini Flash at 55-minute interval<\/li>\n<li style=\"padding: 6px 0;\">&#9744; Set <code>cacheRetention: \"long\"<\/code><\/li>\n<li style=\"padding: 6px 0;\">&#9744; Set $3\/day budget guardrail on OpenRouter<\/li>\n<li style=\"padding: 6px 0;\">&#9744; Trim workspace files (remove template boilerplate)<\/li>\n<li style=\"padding: 6px 0;\">&#9744; Define per-agent models and context limits<\/li>\n<li style=\"padding: 6px 0;\">&#9744; Install QMD in WSL2 for local memory search<\/li>\n<li style=\"padding: 6px 0;\">&#9744; Run <code>\/status<\/code> and verify after 24 hours<\/li>\n<\/ul>\n<\/div>\n<p>Here&#8217;s where to go based on your situation:<\/p>\n<ul style=\"list-style: none; padding-left: 0;\">\n<li style=\"padding: 4px 0;\">&#128640; <strong>Just getting started?<\/strong> Apply the 3 quick fixes from the FAQ above. Takes 5 minutes, saves 50%.<\/li>\n<li style=\"padding: 4px 0;\">&#128640; <strong>Want to go deeper?<\/strong> Read <a href=\"\/ai-automation\/model-routing-cost-reduction\/\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">How I Cut Costs by 70% with Model Routing<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a> for the full routing breakdown.<\/li>\n<li style=\"padding: 4px 0;\">&#128640; <strong>Building an SEO operation?<\/strong> See <a href=\"\/ai-seo\/ai-seo-operation-full-stack\/\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">Why We Built a $27\/mo AI SEO Operation<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a> for the complete stack.<\/li>\n<li style=\"padding: 4px 0;\">&#128640; <strong>Concerned about security?<\/strong> Start with <a href=\"\/ai-automation\/openclaw-security-clawhavoc\/\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">our security hardening guide<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a> before optimizing for cost.<\/li>\n<\/ul>\n<div style=\"background: #f8fafc; border: 2px solid #e2e8f0; border-radius: 12px; padding: 24px; margin: 32px 0;\">\n<h3 style=\"margin-top: 0; color: #1e293b;\">&#128270; Key Takeaways<\/h3>\n<ul>\n<li><strong>Model routing is the biggest lever<\/strong> \u2014 switching defaults from auto to Gemini Flash cuts 50-60% immediately<\/li>\n<li><strong>Heartbeats should use the cheapest API model<\/strong> \u2014 not Ollama (security risk) and not your default writer model<\/li>\n<li><strong>Prompt caching gives 90% discounts<\/strong> \u2014 but only if your system prompts are 100% static<\/li>\n<li><strong>QMD reduces memory tokens by 90%<\/strong> \u2014 install it once your memory files grow past 2,000 tokens<\/li>\n<li><strong>Budget guardrails are non-negotiable<\/strong> \u2014 $3\/day on OpenRouter prevents surprise bills<\/li>\n<li><strong>Our real result: $87\/month down to $27\/month<\/strong> \u2014 same output quality, smarter routing<\/li>\n<\/ul>\n<\/div>\n<p>Explore our complete <a href=\"\/ai-automation\/\" data-wpel-link=\"internal\" rel=\"noopener noreferrer follow\" class=\"wpel-icon-right\">AI Automation &#038; Workflows hub<i class=\"wpel-icon dashicons-before dashicons-admin-page\" aria-hidden=\"true\"><\/i><\/a> for more guides on building production AI agent systems.<\/p>\n<p><!-- designcopy-schema-start --><br \/>\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"Article\",\n  \"headline\": \"OpenClaw Token Optimization: The Complete 2026 Guide\",\n  \"description\": \"We ran OpenClaw at its defaults for three weeks. The bill? $87 in a single month. Most of it was wasted on a frontier model doing simple file reads. \\n This guid\",\n  \"author\": {\n    \"@type\": \"Person\",\n    \"name\": \"DesignCopy\"\n  },\n  \"datePublished\": \"2026-03-02T16:34:49\",\n  \"dateModified\": \"2026-03-07T13:48:01\",\n  \"image\": {\n    \"@type\": \"ImageObject\",\n    \"url\": \"https:\/\/designcopy.net\/wp-content\/uploads\/logo.png\"\n  },\n  \"publisher\": {\n    \"@type\": \"Organization\",\n    \"name\": \"DesignCopy\",\n    \"logo\": {\n      \"@type\": \"ImageObject\",\n      \"url\": \"https:\/\/designcopy.net\/wp-content\/uploads\/logo.png\"\n    }\n  },\n  \"mainEntityOfPage\": {\n    \"@type\": \"WebPage\",\n    \"@id\": \"https:\/\/designcopy.net\/en\/openclaw-token-optimization-guide\/\"\n  }\n}\n<\/script><br \/>\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"FAQPage\",\n  \"mainEntity\": [\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What Is Token Optimization (And Why Your OpenClaw Bill Is Too High)\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Token optimization means spending the least amount of money per task without degrading output quality. In OpenClaw, every message you send, every heartbeat check, every sub-agent call burns tokens. And tokens cost money. The problem is straightforward. OpenClaw\u2019s default config uses openrouter\/auto , which auto-selects models based on availability \u2014 not cost. That means your heartbeat (a simple \u201care you alive?\u201d check that runs every hour) might hit Claude Opus at $15 per million tokens instead o\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Want the Full Config File?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"We\u2019ve open-sourced our complete openclaw.json with all 5 model tiers, agent configs, and caching settings. Grab it from our GitHub repo.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Why Not Ollama (Free)?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Some guides recommend running heartbeats on Ollama, a free local model. We tried it. Don\u2019t. This isn\u2019t theoretical. Security researchers at Palo Alto Networks have documented prompt injection attacks against AI agents. The heartbeat is a particularly attractive target because it runs unattended. Related : Securing Your AI Agent: ClawHavoc, CVE-2026-25253 & How We Hardened covers the full security picture.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What Destroys Caching\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Three things will kill your cache hit rate:\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Running a Multi-Agent SEO Operation?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"See how we wired OpenClaw + n8n + 10 Python scripts into a full AI SEO stack for $27\/month. Read: Why We Built a $27\/mo AI SEO Operation \u2192\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Is OpenClaw free to run?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"OpenClaw itself is free and open-source. The cost comes from the AI models it calls through APIs like OpenRouter. With our optimized config, expect $18-27\/month for a production SEO operation. A minimal personal assistant setup can run under $5\/month on Gemini Flash alone.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Can I use Ollama to make it completely free?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Technically yes, but we don\u2019t recommend it for production. Local models lack the prompt injection hardening of API models. For a personal hobby project with no sensitive data, Ollama is fine. For a business operation handling credentials, emails, and financial data \u2014 use API models with training-level security. Gemini Flash at $0.10\/1M tokens is nearly free anyway.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"How much does Claude Opus cost on OpenClaw?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"About $15 per million tokens. In our setup, Opus handles roughly 5% of tasks (architecture decisions, complex debugging, security audits), costing about $3\/month. The key is never letting Opus touch routine tasks. A single batch job accidentally routed to Opus can cost more than your entire month of Flash usage.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Does prompt caching work with OpenRouter?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Yes, with caveats. Set cacheRetention: \\\"long\\\" for a 1-hour TTL. Cached reads get a 90% discount on Anthropic models. However, some OpenRouter provider pass-throughs don\u2019t forward cache_control headers properly (see GitHub Issue #9600). Verify by checking that cacheRead > 0 after multiple turns in the same session.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What\u2019s the minimum setup for token optimization?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Three changes that take five minutes: Set \\\"model\\\": \\\"google\/gemini-2.0-flash-001\\\" as default in openclaw.json Set \\\"heartbeat.model\\\": \\\"google\/gemini-2.0-flash-001\\\" Set a $3\/day budget guardrail on OpenRouter That alone cuts 50-60% off most users\u2019 bills. Add caching, QMD, and per-agent configs later for the remaining savings.\"\n      }\n    }\n  ]\n}\n<\/script><br \/>\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"WebPage\",\n  \"name\": \"OpenClaw Token Optimization: The Complete 2026 Guide\",\n  \"url\": \"https:\/\/designcopy.net\/en\/openclaw-token-optimization-guide\/\",\n  \"speakable\": {\n    \"@type\": \"SpeakableSpecification\",\n    \"cssSelector\": [\n      \"h1\",\n      \"h2\",\n      \"p\"\n    ]\n  }\n}\n<\/script><br \/>\n<!-- designcopy-schema-end --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>We ran OpenClaw at its defaults for three weeks. The bill? $87 in a single month. Most of it was wasted on a frontier model doing simple file reads. This guide breaks down every optimization we applied to get that number down to $27\/month \u2014 without losing quality on the tasks that matter. You&#8217;ll get [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":262018,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","rank_math_title":"","rank_math_description":"","rank_math_focus_keyword":"","footnotes":""},"categories":[1435],"tags":[304],"class_list":["post-261986","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-seo","tag-ai-tools","et-has-post-format-content","et_post_format-et-post-format-standard"],"_links":{"self":[{"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/posts\/261986","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/comments?post=261986"}],"version-history":[{"count":5,"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/posts\/261986\/revisions"}],"predecessor-version":[{"id":263742,"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/posts\/261986\/revisions\/263742"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/media\/262018"}],"wp:attachment":[{"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/media?parent=261986"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/categories?post=261986"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/designcopy.net\/ko\/wp-json\/wp\/v2\/tags?post=261986"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}