Every OpenClaw session starts with the same problem: your entire MEMORY.md gets injected into the API call. Every single turn. That’s 2,000+ tokens of context — whether the conversation needs it or not.

QMD fixes this. Built by Tobi Lutke (Shopify’s CEO), QMD replaces brute-force memory injection with local hybrid search. It combines BM25 keyword matching, vector embeddings, and LLM reranking to retrieve only the 3-5 snippets that actually matter for each query.

The results from our QMD local search setup: 90% fewer memory tokens per session, 47ms average search latency, and zero data leaving our machine. Here’s exactly how to install, configure, and verify it.


What QMD Does (And Why You Need It)

Without QMD, OpenClaw’s memory system works like a sledgehammer. It reads your full MEMORY.md file — plus any daily memory files — and dumps all of it into every conversation turn. If your memory files total 2,000 tokens, you’re burning 2,000 tokens on context every single time you send a message.

With QMD, that same memory gets indexed locally. When you ask a question, QMD searches your memory files and returns only the relevant snippets — typically 3-5 matches totaling around 200 tokens.

MEMORY TOKEN REDUCTION

90%

~2,000 tokens → ~200 tokens per turn

Here’s what makes QMD different from cloud-based RAG solutions:

  • 100% local execution — no API calls, no data transmitted anywhere
  • Hybrid search pipeline — BM25 (keyword matching) + vector embeddings + LLM reranking
  • Open-source — created by Tobi Lutke, available on npm
  • Sub-50ms latency — faster than a single API roundtrip

The hybrid approach matters. BM25 alone misses semantic connections. Vector search alone misses exact keyword matches. QMD runs both, then uses a lightweight LLM reranker to sort the combined results by relevance.


Prerequisites

Before starting your QMD local search setup, you need a few things in place. The biggest requirement — and the one most people trip over — is WSL2.

⚠️ Warning

QMD will NOT install on native Windows. The npm package depends on a native sqlite-vec binary that doesn’t compile on Windows. You must use WSL2 (Windows Subsystem for Linux).

What you need:

RequirementMinimum VersionCheck Command
WSL2 (Ubuntu recommended)WSL 2.0+wsl --version
Node.js (inside WSL2)22.0+node --version
build-essentialLatestdpkg -l build-essential
Free disk space~3 GBdf -h

Run these quick checks inside your WSL2 terminal:

PREREQUISITE CHECK

node --version       # Must show v22.x or higher
gcc --version        # Must return a version (any)
make --version       # Must return a version (any)

If node --version shows anything below 22, update Node inside WSL2 before proceeding. If gcc or make aren’t found, don’t worry — we’ll install them in Step 3 below.


Step-by-Step Installation

Follow these steps in order inside your WSL2 terminal (not PowerShell, not CMD).

Step 1: Open WSL2

Launch your WSL2 Ubuntu terminal. You can do this from Windows Terminal, or type wsl in PowerShell.

LAUNCH WSL2

wsl

Step 2: Verify Node.js Version

CHECK NODE VERSION

node --version

You should see v22.x.x or higher. If not, install Node 22 via nvm or your preferred method before continuing.

Step 3: Install build-essential

This package provides gcc, g++, and make — all required for compiling QMD’s native sqlite-vec module.

INSTALL BUILD TOOLS

sudo apt-get update && sudo apt-get install -y build-essential

💡 Pro Tip

We hit a build-essential error on our first try. The npm install requires gcc and make to compile the native sqlite-vec module. If you skip this step, you’ll get a cryptic node-gyp error during Step 4. Don’t skip it.

Step 4: Install QMD Globally

INSTALL QMD

npm install -g @tobilu/qmd

This may take 1-2 minutes. You’ll see compilation output for the native modules — that’s normal.

Step 5: Verify Installation

VERIFY QMD

qmd --version

Expected output: qmd v1.0.7 or higher. If you see this, you’re ready to configure.

⚠️ Warning

Forgot your WSL2 Ubuntu password? You can reset it from PowerShell: wsl -u root, then passwd yourusername. Replace yourusername with your actual WSL2 username.

Want the Full Token Optimization Stack?

QMD is one piece of the puzzle. Read our full OpenClaw Token Optimization Guide for model routing, memory management, and cost tracking.


Configuring QMD in openclaw.json

Once QMD is installed, you need to tell OpenClaw to use it. Open your openclaw.json configuration file and add (or update) the memory block.

openclaw.json — MEMORY CONFIGURATION

"memory": {
  "backend": "qmd",
  "qmd": {
    "searchMode": "hybrid",
    "includeDefaultMemory": true,
    "paths": ["~/openclaw-workspace/memory"],
    "updateInterval": 300,
    "maxResults": 5
  }
}

Here’s what each field controls:

  1. backend: "qmd" — Switches OpenClaw from full-file injection to QMD-powered search
  2. searchMode: "hybrid" — Combines BM25 keyword matching with vector similarity search. This is the recommended mode; you can also use "bm25" or "vector" alone, but hybrid produces the best recall
  3. includeDefaultMemory: true — Still loads critical baseline items from MEMORY.md (like your name, project context). Set to false if you want QMD to handle everything
  4. paths — An array of directories QMD should index. Point this at your memory folder(s)
  5. updateInterval: 300 — Re-indexes every 300 seconds (5 minutes). Lower values mean fresher search results but slightly more CPU usage
  6. maxResults: 5 — Returns the top 5 most relevant snippets per query. We’ve found 5 to be the sweet spot — enough context without token bloat

💡 Pro Tip

Keep includeDefaultMemory set to true when you first switch to QMD. This ensures your core identity and project context still loads, while QMD handles the supplementary memory search. You can experiment with false later once you trust the search quality.


First Run and Indexing

With QMD installed and configured, it’s time to build the initial search index.

Step 6: Run the Initial Index

BUILD THE INDEX

qmd index ~/openclaw-workspace/memory

Important: The first run downloads an embedding model of approximately 2 GB. This is a one-time download. Subsequent indexing runs take only a few seconds, even with hundreds of memory files.

Step 7: Verify Search Works

Run a test query to confirm everything is wired up:

TEST SEARCH QUERY

qmd search "model routing" --limit 3

You should see 1-3 results with relevance scores. If QMD returns results from your memory files, the setup is complete. If it returns nothing, double-check that the paths in your openclaw.json point to a directory containing .md files.

💡 Pro Tip

After the initial index, QMD re-indexes automatically based on your updateInterval setting. But if you add a large batch of new memory files, run qmd index manually to pick them up immediately.

Save More With Model Routing

QMD cuts memory tokens. Model routing cuts everything else. See our AI Agent Cost Reduction via Model Routing guide for the full strategy.


Real Results — Before and After

We ran QMD for two weeks on our production OpenClaw setup before writing this post. Here’s what the numbers looked like.

MetricBefore QMDAfter QMD
Memory tokens per turn~2,000~200
Search latencyN/A (full file load)47ms average
Data leaving machineDepends on model providerNever
Estimated monthly memory cost~$5–8 wasted~$0.50
Setup timeNone~20 minutes

AVERAGE SEARCH LATENCY

47ms

Measured over 14 days, ~200 queries/day

DATA EXPOSURE RISK

$0

All search runs locally — nothing sent to external servers

The cost savings alone justified the 20-minute setup. But the privacy benefit is the bigger win for teams handling sensitive project data.


When to Install QMD

Not every OpenClaw setup needs QMD. Here’s a quick decision framework.

Install QMD when:

  • Your MEMORY.md + daily memory files exceed ~2,000 tokens combined
  • You run 20+ OpenClaw sessions per day
  • You store sensitive project data in memory files
  • You want faster, more relevant memory retrieval

Skip QMD for now if:

  • Your total memory files are under 1,000 tokens
  • You only use OpenClaw occasionally
  • You don’t want a 2 GB model download on your machine

💡 Pro Tip

We installed QMD when our memory files hit 3,500 tokens. The break-even point is around 2,000 tokens — below that, the full-file injection approach is fine and the 2 GB embedding model download isn’t worth the disk space.


Troubleshooting Common Mistakes

Even with a straightforward setup, we hit a few snags. Here are the most common ones and their fixes.

1. “node-gyp rebuild failed” during npm install
You forgot to install build-essential. Run sudo apt-get install -y build-essential and then retry the npm install.

2. “command not found: qmd” after installation
Your npm global bin directory isn’t in your PATH. Run npm config get prefix to find the global directory, then add {prefix}/bin to your shell’s PATH.

3. QMD returns zero results on search
Check that your paths in openclaw.json point to a directory containing .md files. Also run qmd index manually — the auto-index might not have triggered yet.

4. “Permission denied” on sudo commands
Reset your WSL2 password from PowerShell: wsl -u root, then passwd yourusername.

5. First index hangs for 10+ minutes
That’s the 2 GB embedding model downloading. Check your internet connection. If it’s slow, let it finish — subsequent runs take seconds.


FAQ

Does QMD work on Windows without WSL2?

No. QMD depends on a native sqlite-vec binary that requires a Linux compilation environment. As of early 2026, there’s no Windows-native build. WSL2 is required for Windows users.

How much disk space does QMD need?

About 2.5-3 GB total. The embedding model is approximately 2 GB, and the search index adds a few hundred MB depending on how many memory files you have.

Can I use QMD with other AI agents besides OpenClaw?

Yes. QMD is a standalone search tool. Any agent or application that can call a CLI command can use qmd search to retrieve relevant snippets. The openclaw.json integration is specific to OpenClaw, but the tool itself is agent-agnostic.

What is QMD’s hybrid search?

Hybrid search combines two retrieval methods. BM25 matches documents by keyword frequency (great for exact terms). Vector search matches by semantic meaning (great for related concepts). QMD runs both, merges the results, and uses a lightweight LLM reranker to sort them by relevance.

Who created QMD?

QMD was created by Tobi Lutke, the CEO of Shopify. He released it as an open-source tool for local AI memory search. It’s available via npm as @tobilu/qmd.


What to Read Next

This QMD local search setup is one piece of a larger token optimization stack. Here’s where to go from here:


☑ QMD Setup Checklist

  • ☐ WSL2 installed and running
  • ☐ Node.js 22+ verified inside WSL2
  • ☐ build-essential installed (gcc, make)
  • ☐ QMD installed globally via npm
  • qmd --version returns v1.0.7+
  • ☐ openclaw.json memory block configured
  • ☐ Initial index built with qmd index
  • ☐ Test search returning relevant results

🔎 Key Takeaways

  • QMD replaces full memory file injection with local hybrid search — cutting ~2,000 tokens down to ~200 per turn
  • It runs entirely on your machine: 47ms latency, zero data transmitted externally
  • WSL2 is mandatory on Windows — QMD won’t compile natively
  • Install build-essential before npm install, or you’ll hit a node-gyp compilation failure
  • The break-even point is around 2,000 tokens of memory files — below that, full injection is fine

Ready to Optimize Your Full OpenClaw Stack?

QMD handles memory. Model routing handles cost. Security hardening handles safety. Get the complete picture in our OpenClaw Token Optimization Guide.