SQLite as a Markdown index
SQLite is sometimes used as a file store for many small files.
The SQLite report shows cases where storing many small BLOBs in SQLite can be faster than storing them as separate files.
The rough idea is that a database connection can be reused, while direct filesystem access often has to open many separate files. That can make SQLite attractive when the workload repeatedly reads many small documents.
I wanted to check whether the same idea applies to Markdown files, and especially whether SQLite can be useful as a Markdown search index.
This is a toy benchmark, not a general performance claim.
Experiment
I compared direct filesystem access with SQLite on my Mac laptop.
Environment:
- MacBook Pro, Apple M3 Pro, 11-core CPU
- 18 GB memory
- macOS 26.5
- Internal SSD
- APFS, mounted with journaling
Benchmark conditions:
- 10,000 Markdown documents
- Different document sizes
- SQLite table with a secondary index on path
- SQLite FTS5 table for full-text search
Random reads
First, I tested random reads.
Dataset:
- 10,000 documents
- 5,000 random reads
Result:
| target | elapsed | ops/sec | bytes read | MB/sec | p50 | p95 | p99 |
|---|---|---|---|---|---|---|---|
| fs | 0.123163s | 40,596.59 | 20,480,000 | 166.28 | 0.010584ms | 0.085292ms | 0.092292ms |
| sqlite | 0.142589s | 35,065.92 | 20,480,000 | 143.63 | 0.002833ms | 0.186084ms | 0.289584ms |
Observations:
- The filesystem had better total throughput in this run.
- SQLite had a lower p50 latency.
- The filesystem had better p95 and p99 latency.
SQLite looked better at p50, but worse at p95 and p99. My guess is that this result depends heavily on cache behavior.
For SQLite, a lookup may need to touch both the secondary index and the Markdown body. If the relevant pages are not already warm, that can cost more than opening and reading a small file directly.
Random reads with warm cache
Dataset:
- 10,000 documents
- 1,000 random reads
- 5 repeats
When I ran the benchmark multiple times, the result changed.
Result:
| document size | target | ops/sec | p50 | p95 |
|---|---|---|---|---|
| 512 B | files | 82,316.78 | 0.011583ms | 0.016333ms |
| 512 B | SQLite | 416,399.48 | 0.002417ms | 0.003125ms |
| 4 KB | files | 86,373.73 | 0.011416ms | 0.012625ms |
| 4 KB | SQLite | 303,616.65 | 0.003334ms | 0.004042ms |
| 32 KB | files | 68,254.29 | 0.014125ms | 0.018458ms |
| 32 KB | SQLite | 120,042.24 | 0.008250ms | 0.009166ms |
| 256 KB | files | 31,154.03 | 0.031792ms | 0.035375ms |
| 256 KB | SQLite | 21,683.13 | 0.044250ms | 0.050584ms |
Observations:
With a warm cache, SQLite was faster for small documents up to 32 KB. At 256 KB, direct filesystem reads were faster.
So the read result is not simply "SQLite is faster." It depends on document size, cache state, and access pattern.
In a read-heavy application, this can still be useful. SQLite has its own page cache, and the OS also has a page cache. If a small subset of Markdown files gets most of the traffic, repeated reads may benefit from that cache behavior.
Random writes
Dataset:
- 10,000 documents
- 4 KB each
- 1,000 whole-document updates
Result:
| target | elapsed | updates/sec | MB/sec |
|---|---|---|---|
| files | 0.110553s | 9,045.40 | 37.05 |
| SQLite | 0.275423s | 3,630.78 | 14.87 |
Observations:
Direct filesystem overwrite was about 2.49x faster than SQLite for this whole-document update benchmark.
For a write-heavy workload, SQLite is not automatically a win. Maintaining indexes adds write cost.
Search
The most interesting result was search performance.
FTS5 is fast because it builds an inverted index.
Without an index, tools like rg and SQL LIKE need to scan document bodies and check whether each document contains the query. With an inverted index, SQLite can look up the query term first and jump directly to the matching documents.
For example:
doc1: sqlite is fast
doc2: markdown search is useful
doc3: sqlite search index
An inverted index stores the mapping in the opposite direction:
sqlite -> doc1, doc3
search -> doc2, doc3
markdown -> doc2
So when the query is sqlite, the search engine can start from doc1 and doc3 instead of checking every document.
Dataset:
- 10,000 documents
- 4 KB each
Result:
| query | target | elapsed | matches |
|---|---|---|---|
| latency | rg over files | 0.187665s | 9,920 |
| latency | SQLite LIKE | 0.075956s | 9,920 |
| latency | SQLite FTS5 | 0.000809s | 9,920 |
| 00009999 | rg over files | 0.112091s | 1 |
| 00009999 | SQLite LIKE | 0.045543s | 1 |
| 00009999 | SQLite FTS5 | 0.000847s | 1 |
Observations:
SQLite FTS5 was much faster than scanning files with rg in this benchmark.
The tradeoff is index cost:
- SQLite store size increased from about 46.8 MB to 106.3 MB
- Building the FTS store took about 2.1s
So FTS5 is not free, but for repeated searches it can be a good tradeoff.
Conclusion
SQLite is not always faster than the filesystem for Markdown storage.
- For cold or mixed random reads, direct filesystem reads can be competitive or faster.
- For warm random reads of small documents, SQLite can be faster.
- For whole-document writes, direct filesystem writes were faster in this benchmark.
- For full-text search, SQLite FTS5 was dramatically faster than scanning files with rg.
This kind of index can be useful for personal notes, documentation sites, local wikis, or AI-agent memory systems where many Markdown files are searched repeatedly.
Mdlite
You can try this idea with mdlite, a SQLite-based Markdown index.