Every OpenClaw session starts with the same problem: your entire MEMORY.md gets injected into the API call. Every single turn. That’s 2,000+ tokens of context — whether the conversation needs it or not.
QMD fixes this. Built by Tobi Lutke (Shopify’s CEO), QMD replaces brute-force memory injection with local hybrid search. It combines BM25 keyword matching, vector embeddings, and LLM reranking to retrieve only the 3-5 snippets that actually matter for each query.
The results from our QMD local search setup: 90% fewer memory tokens per session, 47ms average search latency, and zero data leaving our machine. Here’s exactly how to install, configure, and verify it.
What QMD Does (And Why You Need It)
Without QMD, OpenClaw’s memory system works like a sledgehammer. It reads your full MEMORY.md file — plus any daily memory files — and dumps all of it into every conversation turn. If your memory files total 2,000 tokens, you’re burning 2,000 tokens on context every single time you send a message.
With QMD, that same memory gets indexed locally. When you ask a question, QMD searches your memory files and returns only the relevant snippets — typically 3-5 matches totaling around 200 tokens.
MEMORY TOKEN REDUCTION
90%
~2,000 tokens → ~200 tokens per turn
Here’s what makes QMD different from cloud-based RAG solutions:
- 100% local execution — no API calls, no data transmitted anywhere
- Hybrid search pipeline — BM25 (keyword matching) + vector embeddings + LLM reranking
- Open-source — created by Tobi Lutke, available on npm
- Sub-50ms latency — faster than a single API roundtrip
The hybrid approach matters. BM25 alone misses semantic connections. Vector search alone misses exact keyword matches. QMD runs both, then uses a lightweight LLM reranker to sort the combined results by relevance.
Prerequisites
Before starting your QMD local search setup, you need a few things in place. The biggest requirement — and the one most people trip over — is WSL2.
⚠️ Warning
QMD will NOT install on native Windows. The npm package depends on a native sqlite-vec binary that doesn’t compile on Windows. You must use WSL2 (Windows Subsystem for Linux).
What you need:
| Requirement | Minimum Version | Check Command |
|---|---|---|
| WSL2 (Ubuntu recommended) | WSL 2.0+ | wsl --version |
| Node.js (inside WSL2) | 22.0+ | node --version |
| build-essential | Latest | dpkg -l build-essential |
| Free disk space | ~3 GB | df -h |
Run these quick checks inside your WSL2 terminal:
PREREQUISITE CHECK
node --version # Must show v22.x or higher gcc --version # Must return a version (any) make --version # Must return a version (any)
If node --version shows anything below 22, update Node inside WSL2 before proceeding. If gcc or make aren’t found, don’t worry — we’ll install them in Step 3 below.
Step-by-Step Installation
Follow these steps in order inside your WSL2 terminal (not PowerShell, not CMD).
Step 1: Open WSL2
Launch your WSL2 Ubuntu terminal. You can do this from Windows Terminal, or type wsl in PowerShell.
LAUNCH WSL2
wsl
Step 2: Verify Node.js Version
CHECK NODE VERSION
node --version
You should see v22.x.x or higher. If not, install Node 22 via nvm or your preferred method before continuing.
Step 3: Install build-essential
This package provides gcc, g++, and make — all required for compiling QMD’s native sqlite-vec module.
INSTALL BUILD TOOLS
sudo apt-get update && sudo apt-get install -y build-essential
💡 Pro Tip
We hit a build-essential error on our first try. The npm install requires gcc and make to compile the native sqlite-vec module. If you skip this step, you’ll get a cryptic node-gyp error during Step 4. Don’t skip it.
Step 4: Install QMD Globally
INSTALL QMD
npm install -g @tobilu/qmd
This may take 1-2 minutes. You’ll see compilation output for the native modules — that’s normal.
Step 5: Verify Installation
VERIFY QMD
qmd --version
Expected output: qmd v1.0.7 or higher. If you see this, you’re ready to configure.
⚠️ Warning
Forgot your WSL2 Ubuntu password? You can reset it from PowerShell: wsl -u root, then passwd yourusername. Replace yourusername with your actual WSL2 username.
Want the Full Token Optimization Stack?
QMD is one piece of the puzzle. Read our full OpenClaw Token Optimization Guide for model routing, memory management, and cost tracking.
Configuring QMD in openclaw.json
Once QMD is installed, you need to tell OpenClaw to use it. Open your openclaw.json configuration file and add (or update) the memory block.
openclaw.json — MEMORY CONFIGURATION
"memory": {
"backend": "qmd",
"qmd": {
"searchMode": "hybrid",
"includeDefaultMemory": true,
"paths": ["~/openclaw-workspace/memory"],
"updateInterval": 300,
"maxResults": 5
}
}Here’s what each field controls:
backend: "qmd"— Switches OpenClaw from full-file injection to QMD-powered searchsearchMode: "hybrid"— Combines BM25 keyword matching with vector similarity search. This is the recommended mode; you can also use"bm25"or"vector"alone, but hybrid produces the best recallincludeDefaultMemory: true— Still loads critical baseline items from MEMORY.md (like your name, project context). Set tofalseif you want QMD to handle everythingpaths— An array of directories QMD should index. Point this at your memory folder(s)updateInterval: 300— Re-indexes every 300 seconds (5 minutes). Lower values mean fresher search results but slightly more CPU usagemaxResults: 5— Returns the top 5 most relevant snippets per query. We’ve found 5 to be the sweet spot — enough context without token bloat
💡 Pro Tip
Keep includeDefaultMemory set to true when you first switch to QMD. This ensures your core identity and project context still loads, while QMD handles the supplementary memory search. You can experiment with false later once you trust the search quality.
First Run and Indexing
With QMD installed and configured, it’s time to build the initial search index.
Step 6: Run the Initial Index
BUILD THE INDEX
qmd index ~/openclaw-workspace/memory
Important: The first run downloads an embedding model of approximately 2 GB. This is a one-time download. Subsequent indexing runs take only a few seconds, even with hundreds of memory files.
Step 7: Verify Search Works
Run a test query to confirm everything is wired up:
TEST SEARCH QUERY
qmd search "model routing" --limit 3
You should see 1-3 results with relevance scores. If QMD returns results from your memory files, the setup is complete. If it returns nothing, double-check that the paths in your openclaw.json point to a directory containing .md files.
💡 Pro Tip
After the initial index, QMD re-indexes automatically based on your updateInterval setting. But if you add a large batch of new memory files, run qmd index manually to pick them up immediately.
Save More With Model Routing
QMD cuts memory tokens. Model routing cuts everything else. See our AI Agent Cost Reduction via Model Routing guide for the full strategy.
Real Results — Before and After
We ran QMD for two weeks on our production OpenClaw setup before writing this post. Here’s what the numbers looked like.
| Metric | Before QMD | After QMD |
|---|---|---|
| Memory tokens per turn | ~2,000 | ~200 |
| Search latency | N/A (full file load) | 47ms average |
| Data leaving machine | Depends on model provider | Never |
| Estimated monthly memory cost | ~$5–8 wasted | ~$0.50 |
| Setup time | None | ~20 minutes |
AVERAGE SEARCH LATENCY
47ms
Measured over 14 days, ~200 queries/day
DATA EXPOSURE RISK
$0
All search runs locally — nothing sent to external servers
The cost savings alone justified the 20-minute setup. But the privacy benefit is the bigger win for teams handling sensitive project data.
When to Install QMD
Not every OpenClaw setup needs QMD. Here’s a quick decision framework.
Install QMD when:
- Your MEMORY.md + daily memory files exceed ~2,000 tokens combined
- You run 20+ OpenClaw sessions per day
- You store sensitive project data in memory files
- You want faster, more relevant memory retrieval
Skip QMD for now if:
- Your total memory files are under 1,000 tokens
- You only use OpenClaw occasionally
- You don’t want a 2 GB model download on your machine
💡 Pro Tip
We installed QMD when our memory files hit 3,500 tokens. The break-even point is around 2,000 tokens — below that, the full-file injection approach is fine and the 2 GB embedding model download isn’t worth the disk space.
Troubleshooting Common Mistakes
Even with a straightforward setup, we hit a few snags. Here are the most common ones and their fixes.
1. “node-gyp rebuild failed” during npm install
You forgot to install build-essential. Run sudo apt-get install -y build-essential and then retry the npm install.
2. “command not found: qmd” after installation
Your npm global bin directory isn’t in your PATH. Run npm config get prefix to find the global directory, then add {prefix}/bin to your shell’s PATH.
3. QMD returns zero results on search
Check that your paths in openclaw.json point to a directory containing .md files. Also run qmd index manually — the auto-index might not have triggered yet.
4. “Permission denied” on sudo commands
Reset your WSL2 password from PowerShell: wsl -u root, then passwd yourusername.
5. First index hangs for 10+ minutes
That’s the 2 GB embedding model downloading. Check your internet connection. If it’s slow, let it finish — subsequent runs take seconds.
FAQ
Does QMD work on Windows without WSL2?
No. QMD depends on a native sqlite-vec binary that requires a Linux compilation environment. As of early 2026, there’s no Windows-native build. WSL2 is required for Windows users.
How much disk space does QMD need?
About 2.5-3 GB total. The embedding model is approximately 2 GB, and the search index adds a few hundred MB depending on how many memory files you have.
Can I use QMD with other AI agents besides OpenClaw?
Yes. QMD is a standalone search tool. Any agent or application that can call a CLI command can use qmd search to retrieve relevant snippets. The openclaw.json integration is specific to OpenClaw, but the tool itself is agent-agnostic.
What is QMD’s hybrid search?
Hybrid search combines two retrieval methods. BM25 matches documents by keyword frequency (great for exact terms). Vector search matches by semantic meaning (great for related concepts). QMD runs both, merges the results, and uses a lightweight LLM reranker to sort them by relevance.
Who created QMD?
QMD was created by Tobi Lutke, the CEO of Shopify. He released it as an open-source tool for local AI memory search. It’s available via npm as @tobilu/qmd.
What to Read Next
This QMD local search setup is one piece of a larger token optimization stack. Here’s where to go from here:
- Full optimization strategy: OpenClaw Token Optimization Guide — the pillar post covering model routing, memory management, and cost tracking
- Cut costs further: AI Agent Cost Reduction via Model Routing — route simple tasks to cheaper models automatically
- Browse all guides: AI Automation Hub — every post in this series
☑ QMD Setup Checklist
- ☐ WSL2 installed and running
- ☐ Node.js 22+ verified inside WSL2
- ☐ build-essential installed (gcc, make)
- ☐ QMD installed globally via npm
- ☐
qmd --versionreturns v1.0.7+ - ☐ openclaw.json memory block configured
- ☐ Initial index built with
qmd index - ☐ Test search returning relevant results
🔎 Key Takeaways
- QMD replaces full memory file injection with local hybrid search — cutting ~2,000 tokens down to ~200 per turn
- It runs entirely on your machine: 47ms latency, zero data transmitted externally
- WSL2 is mandatory on Windows — QMD won’t compile natively
- Install build-essential before npm install, or you’ll hit a node-gyp compilation failure
- The break-even point is around 2,000 tokens of memory files — below that, full injection is fine
Ready to Optimize Your Full OpenClaw Stack?
QMD handles memory. Model routing handles cost. Security hardening handles safety. Get the complete picture in our OpenClaw Token Optimization Guide.
