{"id":261992,"date":"2026-03-02T16:34:30","date_gmt":"2026-03-02T07:34:30","guid":{"rendered":"https:\/\/designcopy.net\/en\/?p=261992"},"modified":"2026-04-04T13:33:44","modified_gmt":"2026-04-04T04:33:44","slug":"qmd-local-search-setup-guide","status":"publish","type":"post","link":"https:\/\/designcopy.net\/en\/qmd-local-search-setup-guide\/","title":{"rendered":"Setting Up QMD for Local AI Search: Installation and Real Results"},"content":{"rendered":"<p>Every OpenClaw session starts with the same problem: your entire MEMORY.md gets injected into the API call. Every single turn. That\u2019s 2,000+ tokens of context \u2014 whether the conversation needs it or not.<\/p>\n<p>QMD fixes this. Built by Tobi Lutke (Shopify\u2019s CEO), QMD replaces brute-force memory injection with local hybrid search. It combines BM25 keyword matching, vector embeddings, and LLM reranking to retrieve only the 3-5 snippets that actually matter for each query.<\/p>\n<p>The results from our QMD local search setup: <strong>90% fewer memory tokens per session<\/strong>, <strong>47ms average search latency<\/strong>, and <strong>zero data leaving our machine<\/strong>. Here\u2019s exactly how to install, configure, and verify it. (see <a href=\"https:\/\/developers.google.com\/search\/docs\/fundamentals\/seo-starter-guide\" rel=\"noopener noreferrer nofollow external\" target=\"_blank\" data-wpel-link=\"external\">Google&#8217;s SEO Starter Guide<\/a>)<\/p>\n<hr\/>\n<h2>What QMD Does (And Why You Need It)<\/h2>\n<p>Without QMD, OpenClaw\u2019s memory system works like a sledgehammer. It reads your full MEMORY.md file \u2014 plus any daily memory files \u2014 and dumps all of it into every conversation turn. If your memory files total 2,000 tokens, you\u2019re burning 2,000 tokens on context every single time you send a message.<\/p>\n<p>With QMD, that same memory gets indexed locally. When you ask a question, QMD searches your memory files and returns only the relevant snippets \u2014 typically 3-5 matches totaling around 200 tokens.<\/p>\n<div style=\"background: #ecfdf5; border: 2px solid #10b981; border-radius: 12px; padding: 20px 24px; margin: 24px 0; text-align: center;\">\n<p style=\"margin: 0; font-size: 14px; color: #059669; font-weight: 600;\">MEMORY TOKEN REDUCTION<\/p>\n<p style=\"margin: 8px 0 0 0; font-size: 36px; font-weight: bold; color: #047857;\">90%<\/p>\n<p style=\"margin: 4px 0 0 0; font-size: 14px; color: #6b7280;\">~2,000 tokens \u2192 ~200 tokens per turn<\/p>\n<\/div>\n<p>Here\u2019s what makes QMD different from cloud-based RAG solutions:<\/p>\n<ul>\n<li><strong>100% local execution<\/strong> \u2014 no API calls, no data transmitted anywhere<\/li>\n<li><strong>Hybrid search pipeline<\/strong> \u2014 BM25 (keyword matching) + vector embeddings + LLM reranking<\/li>\n<li><strong>Open-source<\/strong> \u2014 created by Tobi Lutke, available on npm<\/li>\n<li><strong>Sub-50ms latency<\/strong> \u2014 faster than a single API roundtrip<\/li>\n<\/ul>\n<p>The hybrid approach matters. BM25 alone misses semantic connections. Vector search alone misses exact keyword matches. QMD runs both, then uses a lightweight LLM reranker to sort the combined results by relevance.<\/p>\n<hr\/>\n<h2>Prerequisites<\/h2>\n<p>Before starting your QMD local search setup, you need a few things in place. The biggest requirement \u2014 and the one most people trip over \u2014 is WSL2.<\/p>\n<div style=\"background: #fef2f2; border-left: 4px solid #ef4444; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #dc2626;\">&#x26a0;&#xfe0f; Warning<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">QMD will NOT install on native Windows. The npm package depends on a native sqlite-vec binary that doesn\u2019t compile on Windows. You must use WSL2 (Windows Subsystem for Linux).<\/p>\n<\/div>\n<p><strong>What you need:<\/strong><\/p>\n<div style=\"overflow-x:auto; margin:24px 0; border-radius:8px; border:1px solid #e2e8f0;\">\n<table style=\"width:100%; border-collapse:collapse; font-size:15px; line-height:1.6;\">\n<thead>\n<tr>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Requirement<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Minimum Version<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Check Command<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">WSL2 (Ubuntu recommended)<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">WSL 2.0+<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>wsl --version<\/code><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Node.js (inside WSL2)<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">22.0+<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>node --version<\/code><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">build-essential<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Latest<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>dpkg -l build-essential<\/code><\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Free disk space<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">~3 GB<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\"><code>df -h<\/code><\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<p>Run these quick checks inside your WSL2 terminal:<\/p>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">PREREQUISITE CHECK<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">node --version       # Must show v22.x or higher\ngcc --version        # Must return a version (any)\nmake --version       # Must return a version (any)<\/pre>\n<\/div>\n<p>If <code>node --version<\/code> shows anything below 22, update Node inside WSL2 before proceeding. If <code>gcc<\/code> or <code>make<\/code> aren\u2019t found, don\u2019t worry \u2014 we\u2019ll install them in Step 3 below.<\/p>\n<hr\/>\n<h2>Step-by-Step Installation<\/h2>\n<p>Follow these steps in order inside your <strong>WSL2 terminal<\/strong> (not PowerShell, not CMD).<\/p>\n<h3>Step 1: Open WSL2<\/h3>\n<p>Launch your WSL2 Ubuntu terminal. You can do this from Windows Terminal, or type <code>wsl<\/code> in PowerShell.<\/p>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">LAUNCH WSL2<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">wsl<\/pre>\n<\/div>\n<h3>Step 2: Verify Node.js Version<\/h3>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">CHECK NODE VERSION<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">node --version<\/pre>\n<\/div>\n<p>You should see <code>v22.x.x<\/code> or higher. If not, install Node 22 via nvm or your preferred method before continuing.<\/p>\n<h3>Step 3: Install build-essential<\/h3>\n<p>This package provides <code>gcc<\/code>, <code>g++<\/code>, and <code>make<\/code> \u2014 all required for compiling QMD\u2019s native sqlite-vec module.<\/p>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">INSTALL BUILD TOOLS<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">sudo apt-get update &amp;&amp; sudo apt-get install -y build-essential<\/pre>\n<\/div>\n<div style=\"background: #f0f9ff; border-left: 4px solid #0ea5e9; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #0369a1;\">&#x1f4a1; Pro Tip<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">We hit a build-essential error on our first try. The npm install requires gcc and make to compile the native sqlite-vec module. If you skip this step, you\u2019ll get a cryptic <code>node-gyp<\/code> error during Step 4. Don\u2019t skip it.<\/p>\n<\/div>\n<h3>Step 4: Install QMD Globally<\/h3>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">INSTALL QMD<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">npm install -g @tobilu\/qmd<\/pre>\n<\/div>\n<p>This may take 1-2 minutes. You\u2019ll see compilation output for the native modules \u2014 that\u2019s normal.<\/p>\n<h3>Step 5: Verify Installation<\/h3>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">VERIFY QMD<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">qmd --version<\/pre>\n<\/div>\n<p>Expected output: <code>qmd v1.0.7<\/code> or higher. If you see this, you\u2019re ready to configure.<\/p>\n<div style=\"background: #fef2f2; border-left: 4px solid #ef4444; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #dc2626;\">&#x26a0;&#xfe0f; Warning<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">Forgot your WSL2 Ubuntu password? You can reset it from PowerShell: <code>wsl -u root<\/code>, then <code>passwd yourusername<\/code>. Replace <code>yourusername<\/code> with your actual WSL2 username. (see <a href=\"https:\/\/ahrefs.com\/blog\/seo-basics\/\" rel=\"noopener noreferrer nofollow external\" target=\"_blank\" data-wpel-link=\"external\">Ahrefs&#8217; SEO fundamentals<\/a>)<\/p>\n<\/div>\n<div style=\"background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 12px; padding: 24px 32px; margin: 32px 0; color: white; text-align: center;\">\n<h3 style=\"color: white; margin-top: 0; font-size: 22px;\">Want the Full Token Optimization Stack?<\/h3>\n<p style=\"color: rgba(255,255,255,0.9); font-size: 16px;\">QMD is one piece of the puzzle. Read our full <a class=\"wpel-icon-right\" data-wpel-link=\"internal\" href=\"\/ai-automation\/openclaw-token-optimization-guide\/\" rel=\"noopener noreferrer follow\" style=\"color: #fbbf24; text-decoration: underline;\">OpenClaw Token Optimization Guide<i aria-hidden=\"true\" class=\"wpel-icon dashicons-before dashicons-admin-page\"><\/i><\/a> for model routing, memory management, and cost tracking.<\/p>\n<\/div>\n<hr\/>\n<h2>Configuring QMD in openclaw.json<\/h2>\n<p>Once QMD is installed, you need to tell OpenClaw to use it. Open your <code>openclaw.json<\/code> configuration file and add (or update) the <code>memory<\/code> block.<\/p>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">openclaw.json \u2014 MEMORY CONFIGURATION<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">\"memory\": {\n  \"backend\": \"qmd\",\n  \"qmd\": {\n    \"searchMode\": \"hybrid\",\n    \"includeDefaultMemory\": true,\n    \"paths\": [\"~\/openclaw-workspace\/memory\"],\n    \"updateInterval\": 300,\n    \"maxResults\": 5\n  }\n}<\/pre>\n<\/div>\n<p>Here\u2019s what each field controls:<\/p>\n<ol>\n<li><strong><code>backend: \"qmd\"<\/code><\/strong> \u2014 Switches OpenClaw from full-file injection to QMD-powered search<\/li>\n<li><strong><code>searchMode: \"hybrid\"<\/code><\/strong> \u2014 Combines BM25 keyword matching with vector similarity search. This is the recommended mode; you can also use <code>\"bm25\"<\/code> or <code>\"vector\"<\/code> alone, but hybrid produces the best recall<\/li>\n<li><strong><code>includeDefaultMemory: true<\/code><\/strong> \u2014 Still loads critical baseline items from MEMORY.md (like your name, project context). Set to <code>false<\/code> if you want QMD to handle everything<\/li>\n<li><strong><code>paths<\/code><\/strong> \u2014 An array of directories QMD should index. Point this at your memory folder(s)<\/li>\n<li><strong><code>updateInterval: 300<\/code><\/strong> \u2014 Re-indexes every 300 seconds (5 minutes). Lower values mean fresher search results but slightly more CPU usage<\/li>\n<li><strong><code>maxResults: 5<\/code><\/strong> \u2014 Returns the top 5 most relevant snippets per query. We\u2019ve found 5 to be the sweet spot \u2014 enough context without token bloat<\/li>\n<\/ol>\n<div style=\"background: #f0f9ff; border-left: 4px solid #0ea5e9; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #0369a1;\">&#x1f4a1; Pro Tip<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">Keep <code>includeDefaultMemory<\/code> set to <code>true<\/code> when you first switch to QMD. This ensures your core identity and project context still loads, while QMD handles the supplementary memory search. You can experiment with <code>false<\/code> later once you trust the search quality.<\/p>\n<\/div>\n<hr\/>\n<h2>First Run and Indexing<\/h2>\n<p>With QMD installed and configured, it\u2019s time to build the initial search index.<\/p>\n<h3>Step 6: Run the Initial Index<\/h3>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">BUILD THE INDEX<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">qmd index ~\/openclaw-workspace\/memory<\/pre>\n<\/div>\n<p><strong>Important:<\/strong> The first run downloads an embedding model of approximately 2 GB. This is a one-time download. Subsequent indexing runs take only a few seconds, even with hundreds of memory files.<\/p>\n<h3>Step 7: Verify Search Works<\/h3>\n<p>Run a test query to confirm everything is wired up:<\/p>\n<div style=\"background: #1e293b; border-radius: 8px; padding: 20px; margin: 24px 0; overflow-x: auto;\">\n<p style=\"margin: 0 0 8px 0; font-size: 12px; color: #94a3b8; font-weight: 600;\">TEST SEARCH QUERY<\/p>\n<pre style=\"margin: 0; color: #e2e8f0; font-family: 'Fira Code', 'Courier New', monospace; font-size: 14px; line-height: 1.6; white-space: pre-wrap;\">qmd search \"model routing\" --limit 3<\/pre>\n<\/div>\n<p>You should see 1-3 results with relevance scores. If QMD returns results from your memory files, the setup is complete. If it returns nothing, double-check that the <code>paths<\/code> in your <code>openclaw.json<\/code> point to a directory containing <code>.md<\/code> files.<\/p>\n<div style=\"background: #f0f9ff; border-left: 4px solid #0ea5e9; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #0369a1;\">&#x1f4a1; Pro Tip<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">After the initial index, QMD re-indexes automatically based on your <code>updateInterval<\/code> setting. But if you add a large batch of new memory files, run <code>qmd index<\/code> manually to pick them up immediately.<\/p>\n<\/div>\n<div style=\"background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 12px; padding: 24px 32px; margin: 32px 0; color: white; text-align: center;\">\n<h3 style=\"color: white; margin-top: 0; font-size: 22px;\">Save More With Model Routing<\/h3>\n<p style=\"color: rgba(255,255,255,0.9); font-size: 16px;\">QMD cuts memory tokens. Model routing cuts everything else. See our <a class=\"wpel-icon-right\" data-wpel-link=\"internal\" href=\"\/ai-automation\/ai-agent-cost-reduction-model-routing\/\" rel=\"noopener noreferrer follow\" style=\"color: #fbbf24; text-decoration: underline;\">AI Agent Cost Reduction via Model Routing<i aria-hidden=\"true\" class=\"wpel-icon dashicons-before dashicons-admin-page\"><\/i><\/a> guide for the full strategy.<\/p>\n<\/div>\n<hr\/>\n<h2>Real Results \u2014 Before and After<\/h2>\n<p>We ran QMD for two weeks on our production OpenClaw setup before writing this post. Here\u2019s what the numbers looked like.<\/p>\n<div style=\"overflow-x:auto; margin:24px 0; border-radius:8px; border:1px solid #e2e8f0;\">\n<table style=\"width:100%; border-collapse:collapse; font-size:15px; line-height:1.6;\">\n<thead>\n<tr>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Metric<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">Before QMD<\/th>\n<th style=\"text-align:left; padding:12px 16px; background:#1e293b; color:#f1f5f9; font-weight:600; font-size:14px; border-bottom:2px solid #334155; white-space:nowrap;\">After QMD<\/th>\n<\/tr>\n<\/thead>\n<tbody>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Memory tokens per turn<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">~2,000<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">~200<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Search latency<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">N\/A (full file load)<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">47ms average<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Data leaving machine<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Depends on model provider<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Never<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">Estimated monthly memory cost<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">~$5\u20138 wasted<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#ffffff; border-bottom:1px solid #e2e8f0; color:#334155;\">~$0.50<\/td>\n<\/tr>\n<tr>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">Setup time<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">None<\/td>\n<td style=\"text-align:left; padding:10px 16px; background:#f8fafc; border-bottom:1px solid #e2e8f0; color:#334155;\">~20 minutes<\/td>\n<\/tr>\n<\/tbody>\n<\/table>\n<\/div>\n<div style=\"background: #ecfdf5; border: 2px solid #10b981; border-radius: 12px; padding: 20px 24px; margin: 24px 0; text-align: center;\">\n<p style=\"margin: 0; font-size: 14px; color: #059669; font-weight: 600;\">AVERAGE SEARCH LATENCY<\/p>\n<p style=\"margin: 8px 0 0 0; font-size: 36px; font-weight: bold; color: #047857;\">47ms<\/p>\n<p style=\"margin: 4px 0 0 0; font-size: 14px; color: #6b7280;\">Measured over 14 days, ~200 queries\/day<\/p>\n<\/div>\n<div style=\"background: #ecfdf5; border: 2px solid #10b981; border-radius: 12px; padding: 20px 24px; margin: 24px 0; text-align: center;\">\n<p style=\"margin: 0; font-size: 14px; color: #059669; font-weight: 600;\">DATA EXPOSURE RISK<\/p>\n<p style=\"margin: 8px 0 0 0; font-size: 36px; font-weight: bold; color: #047857;\">$0<\/p>\n<p style=\"margin: 4px 0 0 0; font-size: 14px; color: #6b7280;\">All search runs locally \u2014 nothing sent to external servers<\/p>\n<\/div>\n<p>The cost savings alone justified the 20-minute setup. But the privacy benefit is the bigger win for teams handling sensitive project data.<\/p>\n<hr\/>\n<h2>When to Install QMD<\/h2>\n<p>Not every OpenClaw setup needs QMD. Here\u2019s a quick decision framework.<\/p>\n<p><strong>Install QMD when:<\/strong><\/p>\n<ul>\n<li>Your MEMORY.md + daily memory files exceed ~2,000 tokens combined<\/li>\n<li>You run 20+ OpenClaw sessions per day<\/li>\n<li>You store sensitive project data in memory files<\/li>\n<li>You want faster, more relevant memory retrieval<\/li>\n<\/ul>\n<p><strong>Skip QMD for now if:<\/strong><\/p>\n<ul>\n<li>Your total memory files are under 1,000 tokens<\/li>\n<li>You only use OpenClaw occasionally<\/li>\n<li>You don\u2019t want a 2 GB model download on your machine<\/li>\n<\/ul>\n<div style=\"background: #f0f9ff; border-left: 4px solid #0ea5e9; border-radius: 0 8px 8px 0; padding: 16px 20px; margin: 24px 0;\">\n<p style=\"margin: 0; font-weight: 600; color: #0369a1;\">&#x1f4a1; Pro Tip<\/p>\n<p style=\"margin: 8px 0 0 0; color: #334155;\">We installed QMD when our memory files hit 3,500 tokens. The break-even point is around 2,000 tokens \u2014 below that, the full-file injection approach is fine and the 2 GB embedding model download isn\u2019t worth the disk space.<\/p>\n<\/div>\n<hr\/>\n<h2>Troubleshooting Common Mistakes<\/h2>\n<p>Even with a straightforward setup, we hit a few snags. Here are the most common ones and their fixes. (see <a href=\"https:\/\/moz.com\/beginners-guide-to-seo\" rel=\"noopener noreferrer nofollow external\" target=\"_blank\" data-wpel-link=\"external\">Moz Beginner&#8217;s Guide to SEO<\/a>)<\/p>\n<p><strong>1. \u201cnode-gyp rebuild failed\u201d during npm install<\/strong><br \/>\nYou forgot to install <code>build-essential<\/code>. Run <code>sudo apt-get install -y build-essential<\/code> and then retry the npm install.<\/p>\n<p><strong>2. \u201ccommand not found: qmd\u201d after installation<\/strong><br \/>\nYour npm global bin directory isn\u2019t in your PATH. Run <code>npm config get prefix<\/code> to find the global directory, then add <code>{prefix}\/bin<\/code> to your shell\u2019s PATH.<\/p>\n<p><strong>3. QMD returns zero results on search<\/strong><br \/>\nCheck that your <code>paths<\/code> in <code>openclaw.json<\/code> point to a directory containing <code>.md<\/code> files. Also run <code>qmd index<\/code> manually \u2014 the auto-index might not have triggered yet.<\/p>\n<p><strong>4. \u201cPermission denied\u201d on sudo commands<\/strong><br \/>\nReset your WSL2 password from PowerShell: <code>wsl -u root<\/code>, then <code>passwd yourusername<\/code>.<\/p>\n<p><strong>5. First index hangs for 10+ minutes<\/strong><br \/>\nThat\u2019s the 2 GB embedding model downloading. Check your internet connection. If it\u2019s slow, let it finish \u2014 subsequent runs take seconds.<\/p>\n<hr\/>\n<h2>FAQ<\/h2>\n<p><strong>Does QMD work on Windows without WSL2?<\/strong><\/p>\n<p>No. QMD depends on a native sqlite-vec binary that requires a Linux compilation environment. As of early 2026, there\u2019s no Windows-native build. WSL2 is required for Windows users.<\/p>\n<p><strong>How much disk space does QMD need?<\/strong><\/p>\n<p>About 2.5-3 GB total. The embedding model is approximately 2 GB, and the search index adds a few hundred MB depending on how many memory files you have.<\/p>\n<p><strong>Can I use QMD with other AI agents besides OpenClaw?<\/strong><\/p>\n<p>Yes. QMD is a standalone search tool. Any agent or application that can call a CLI command can use <code>qmd search<\/code> to retrieve relevant snippets. The <code>openclaw.json<\/code> integration is specific to OpenClaw, but the tool itself is agent-agnostic.<\/p>\n<p><strong>What is QMD\u2019s hybrid search?<\/strong><\/p>\n<p>Hybrid search combines two retrieval methods. BM25 matches documents by keyword frequency (great for exact terms). Vector search matches by semantic meaning (great for related concepts). QMD runs both, merges the results, and uses a lightweight LLM reranker to sort them by relevance.<\/p>\n<p><strong>Who created QMD?<\/strong><\/p>\n<p>QMD was created by Tobi Lutke, the CEO of Shopify. He released it as an open-source tool for local AI memory search. It\u2019s available via npm as <code>@tobilu\/qmd<\/code>.<\/p>\n<hr\/>\n<h2>What to Read Next<\/h2>\n<p>This QMD local search setup is one piece of a larger token optimization stack. Here\u2019s where to go from here:<\/p>\n<ul>\n<li><strong>Full optimization strategy:<\/strong> <a class=\"wpel-icon-right\" data-wpel-link=\"internal\" href=\"\/ai-automation\/openclaw-token-optimization-guide\/\" rel=\"noopener noreferrer follow\">OpenClaw Token Optimization Guide<i aria-hidden=\"true\" class=\"wpel-icon dashicons-before dashicons-admin-page\"><\/i><\/a> \u2014 the pillar post covering model routing, memory management, and cost tracking<\/li>\n<li><strong>Cut costs further:<\/strong> <a class=\"wpel-icon-right\" data-wpel-link=\"internal\" href=\"\/ai-automation\/ai-agent-cost-reduction-model-routing\/\" rel=\"noopener noreferrer follow\">AI Agent Cost Reduction via Model Routing<i aria-hidden=\"true\" class=\"wpel-icon dashicons-before dashicons-admin-page\"><\/i><\/a> \u2014 route simple tasks to cheaper models automatically<\/li>\n<li><strong>Browse all guides:<\/strong> <a class=\"wpel-icon-right\" data-wpel-link=\"internal\" href=\"\/ai-automation\/\" rel=\"noopener noreferrer follow\">AI Automation Hub<i aria-hidden=\"true\" class=\"wpel-icon dashicons-before dashicons-admin-page\"><\/i><\/a> \u2014 every post in this series<\/li>\n<\/ul>\n<hr\/>\n<div style=\"background: #fffbeb; border: 2px solid #f59e0b; border-radius: 12px; padding: 24px; margin: 32px 0;\">\n<h3 style=\"margin-top: 0; color: #92400e;\">&#x2611; QMD Setup Checklist<\/h3>\n<ul style=\"list-style: none; padding-left: 0;\">\n<li style=\"padding: 6px 0;\">\u2610 WSL2 installed and running<\/li>\n<li style=\"padding: 6px 0;\">\u2610 Node.js 22+ verified inside WSL2<\/li>\n<li style=\"padding: 6px 0;\">\u2610 build-essential installed (gcc, make)<\/li>\n<li style=\"padding: 6px 0;\">\u2610 QMD installed globally via npm<\/li>\n<li style=\"padding: 6px 0;\">\u2610 <code>qmd --version<\/code> returns v1.0.7+<\/li>\n<li style=\"padding: 6px 0;\">\u2610 openclaw.json memory block configured<\/li>\n<li style=\"padding: 6px 0;\">\u2610 Initial index built with <code>qmd index<\/code><\/li>\n<li style=\"padding: 6px 0;\">\u2610 Test search returning relevant results<\/li>\n<\/ul>\n<\/div>\n<div style=\"background: #f8fafc; border: 2px solid #e2e8f0; border-radius: 12px; padding: 24px; margin: 32px 0;\">\n<h3 style=\"margin-top: 0; color: #1e293b;\">&#x1f50e; Key Takeaways<\/h3>\n<ul>\n<li>QMD replaces full memory file injection with local hybrid search \u2014 cutting ~2,000 tokens down to ~200 per turn<\/li>\n<li>It runs entirely on your machine: 47ms latency, zero data transmitted externally<\/li>\n<li>WSL2 is mandatory on Windows \u2014 QMD won\u2019t compile natively<\/li>\n<li>Install build-essential before npm install, or you\u2019ll hit a node-gyp compilation failure<\/li>\n<li>The break-even point is around 2,000 tokens of memory files \u2014 below that, full injection is fine<\/li>\n<\/ul>\n<\/div>\n<div style=\"background: linear-gradient(135deg, #667eea 0%, #764ba2 100%); border-radius: 12px; padding: 24px 32px; margin: 32px 0; color: white; text-align: center;\">\n<h3 style=\"color: white; margin-top: 0; font-size: 22px;\">Ready to Optimize Your Full OpenClaw Stack?<\/h3>\n<p style=\"color: rgba(255,255,255,0.9); font-size: 16px;\">QMD handles memory. Model routing handles cost. Security hardening handles safety. Get the complete picture in our <a class=\"wpel-icon-right\" data-wpel-link=\"internal\" href=\"\/ai-automation\/openclaw-token-optimization-guide\/\" rel=\"noopener noreferrer follow\" style=\"color: #fbbf24; text-decoration: underline;\">OpenClaw Token Optimization Guide<i aria-hidden=\"true\" class=\"wpel-icon dashicons-before dashicons-admin-page\"><\/i><\/a>.<\/p>\n<\/div>\n<p><!-- designcopy-schema-start --><br \/>\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"Article\",\n  \"headline\": \"Setting Up QMD for Local AI Search: Installation and Real Results\",\n  \"description\": \"Every OpenClaw session starts with the same problem: your entire MEMORY.md gets injected into the API call. Every single turn. That\u2019s 2,000+ tokens of context \u2014\",\n  \"author\": {\n    \"@type\": \"Person\",\n    \"name\": \"DesignCopy\"\n  },\n  \"datePublished\": \"2026-03-02T16:34:30\",\n  \"dateModified\": \"2026-03-07T13:48:05\",\n  \"image\": {\n    \"@type\": \"ImageObject\",\n    \"url\": \"https:\/\/designcopy.net\/wp-content\/uploads\/logo.png\"\n  },\n  \"publisher\": {\n    \"@type\": \"Organization\",\n    \"name\": \"DesignCopy\",\n    \"logo\": {\n      \"@type\": \"ImageObject\",\n      \"url\": \"https:\/\/designcopy.net\/wp-content\/uploads\/logo.png\"\n    }\n  },\n  \"mainEntityOfPage\": {\n    \"@type\": \"WebPage\",\n    \"@id\": \"https:\/\/designcopy.net\/en\/qmd-local-search-setup-guide\/\"\n  }\n}\n<\/script><br \/>\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"FAQPage\",\n  \"mainEntity\": [\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What QMD Does (And Why You Need It)\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Without QMD, OpenClaw\u2019s memory system works like a sledgehammer. It reads your full MEMORY.md file \u2014 plus any daily memory files \u2014 and dumps all of it into every conversation turn. If your memory files total 2,000 tokens, you\u2019re burning 2,000 tokens on context every single time you send a message. With QMD, that same memory gets indexed locally. When you ask a question, QMD searches your memory files and returns only the relevant snippets \u2014 typically 3-5 matches totaling around 200 tokens. Here\u2019\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Want the Full Token Optimization Stack?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"QMD is one piece of the puzzle. Read our full OpenClaw Token Optimization Guide for model routing, memory management, and cost tracking.\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"When to Install QMD\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"Not every OpenClaw setup needs QMD. Here\u2019s a quick decision framework. Install QMD when: Your MEMORY.md + daily memory files exceed ~2,000 tokens combined You run 20+ OpenClaw sessions per day You store sensitive project data in memory files You want faster, more relevant memory retrieval Skip QMD for now if: Your total memory files are under 1,000 tokens You only use OpenClaw occasionally You don\u2019t want a 2 GB model download on your machine\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"What to Read Next\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"This QMD local search setup is one piece of a larger token optimization stack. Here\u2019s where to go from here: Full optimization strategy: OpenClaw Token Optimization Guide \u2014 the pillar post covering model routing, memory management, and cost tracking Cut costs further: AI Agent Cost Reduction via Model Routing \u2014 route simple tasks to cheaper models automatically Browse all guides: AI Automation Hub \u2014 every post in this series\"\n      }\n    },\n    {\n      \"@type\": \"Question\",\n      \"name\": \"Ready to Optimize Your Full OpenClaw Stack?\",\n      \"acceptedAnswer\": {\n        \"@type\": \"Answer\",\n        \"text\": \"QMD handles memory. Model routing handles cost. Security hardening handles safety. Get the complete picture in our OpenClaw Token Optimization Guide .\"\n      }\n    }\n  ]\n}\n<\/script><br \/>\n<script type=\"application\/ld+json\">\n{\n  \"@context\": \"https:\/\/schema.org\",\n  \"@type\": \"WebPage\",\n  \"name\": \"Setting Up QMD for Local AI Search: Installation and Real Results\",\n  \"url\": \"https:\/\/designcopy.net\/en\/qmd-local-search-setup-guide\/\",\n  \"speakable\": {\n    \"@type\": \"SpeakableSpecification\",\n    \"cssSelector\": [\n      \"h1\",\n      \"h2\",\n      \"p\"\n    ]\n  }\n}\n<\/script><br \/>\n<!-- designcopy-schema-end --><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Every OpenClaw session starts with the same problem: your entire MEMORY.md gets injected into the API call. Every single turn. That\u2019s 2,000+ tokens of context \u2014 whether the conversation needs it or not. QMD fixes this. Built by Tobi Lutke (Shopify\u2019s CEO), QMD replaces brute-force memory injection with local hybrid search. It combines BM25 keyword [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":262020,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"_et_pb_use_builder":"","_et_pb_old_content":"","_et_gb_content_width":"","footnotes":""},"categories":[1435],"tags":[],"class_list":["post-261992","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-ai-seo","et-has-post-format-content","et_post_format-et-post-format-standard"],"_links":{"self":[{"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/posts\/261992","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/comments?post=261992"}],"version-history":[{"count":5,"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/posts\/261992\/revisions"}],"predecessor-version":[{"id":264329,"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/posts\/261992\/revisions\/264329"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/media\/262020"}],"wp:attachment":[{"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/media?parent=261992"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/categories?post=261992"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/designcopy.net\/en\/wp-json\/wp\/v2\/tags?post=261992"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}