Using SQLite as a Markdown Search Index

SQLite as a Markdown index

SQLite is sometimes used as a file store for many small files.

https://www.sqlite.org/fasterthanfs.html

The SQLite report shows cases where storing many small BLOBs in SQLite can be faster than storing them as separate files.

The rough idea is that a database connection can be reused, while direct filesystem access often has to open many separate files. That can make SQLite attractive when the workload repeatedly reads many small documents.

I wanted to check whether the same idea applies to Markdown files, and especially whether SQLite can be useful as a Markdown search index.

This is a toy benchmark, not a general performance claim.

Experiment

I compared direct filesystem access with SQLite on my Mac laptop.

https://github.com/tsugumi-sys/sqlite-performance/tree/main/small-files-io

Environment:

MacBook Pro, Apple M3 Pro, 11-core CPU
18 GB memory
macOS 26.5
Internal SSD
APFS, mounted with journaling

Benchmark conditions:

10,000 Markdown documents
Different document sizes
SQLite table with a secondary index on path
SQLite FTS5 table for full-text search

Random reads

First, I tested random reads.

Dataset:

10,000 documents
5,000 random reads

Result:

target	elapsed	ops/sec	bytes read	MB/sec	p50	p95	p99
fs	0.123163s	40,596.59	20,480,000	166.28	0.010584ms	0.085292ms	0.092292ms
sqlite	0.142589s	35,065.92	20,480,000	143.63	0.002833ms	0.186084ms	0.289584ms

Observations:

The filesystem had better total throughput in this run.
SQLite had a lower p50 latency.
The filesystem had better p95 and p99 latency.

SQLite looked better at p50, but worse at p95 and p99. My guess is that this result depends heavily on cache behavior.

For SQLite, a lookup may need to touch both the secondary index and the Markdown body. If the relevant pages are not already warm, that can cost more than opening and reading a small file directly.

Random reads with warm cache

Dataset:

10,000 documents
1,000 random reads
5 repeats

When I ran the benchmark multiple times, the result changed.

Result:

document size	target	ops/sec	p50	p95
512 B	files	82,316.78	0.011583ms	0.016333ms
512 B	SQLite	416,399.48	0.002417ms	0.003125ms
4 KB	files	86,373.73	0.011416ms	0.012625ms
4 KB	SQLite	303,616.65	0.003334ms	0.004042ms
32 KB	files	68,254.29	0.014125ms	0.018458ms
32 KB	SQLite	120,042.24	0.008250ms	0.009166ms
256 KB	files	31,154.03	0.031792ms	0.035375ms
256 KB	SQLite	21,683.13	0.044250ms	0.050584ms

Observations:

With a warm cache, SQLite was faster for small documents up to 32 KB. At 256 KB, direct filesystem reads were faster.

So the read result is not simply "SQLite is faster." It depends on document size, cache state, and access pattern.

In a read-heavy application, this can still be useful. SQLite has its own page cache, and the OS also has a page cache. If a small subset of Markdown files gets most of the traffic, repeated reads may benefit from that cache behavior.

Random writes

Dataset:

10,000 documents
4 KB each
1,000 whole-document updates

Result:

target	elapsed	updates/sec	MB/sec
files	0.110553s	9,045.40	37.05
SQLite	0.275423s	3,630.78	14.87

Observations:

Direct filesystem overwrite was about 2.49x faster than SQLite for this whole-document update benchmark.

For a write-heavy workload, SQLite is not automatically a win. Maintaining indexes adds write cost.

Search

The most interesting result was search performance.

FTS5 is fast because it builds an inverted index.

Without an index, tools like rg and SQL LIKE need to scan document bodies and check whether each document contains the query. With an inverted index, SQLite can look up the query term first and jump directly to the matching documents.

For example:

doc1: sqlite is fast
doc2: markdown search is useful
doc3: sqlite search index

An inverted index stores the mapping in the opposite direction:

sqlite   -> doc1, doc3
search   -> doc2, doc3
markdown -> doc2

So when the query is sqlite, the search engine can start from doc1 and doc3 instead of checking every document.

Dataset:

10,000 documents
4 KB each

Result:

query	target	elapsed	matches
latency	rg over files	0.187665s	9,920
latency	SQLite LIKE	0.075956s	9,920
latency	SQLite FTS5	0.000809s	9,920
00009999	rg over files	0.112091s	1
00009999	SQLite LIKE	0.045543s	1
00009999	SQLite FTS5	0.000847s	1

Observations:

SQLite FTS5 was much faster than scanning files with rg in this benchmark.

The tradeoff is index cost:

SQLite store size increased from about 46.8 MB to 106.3 MB
Building the FTS store took about 2.1s

So FTS5 is not free, but for repeated searches it can be a good tradeoff.

Conclusion

SQLite is not always faster than the filesystem for Markdown storage.

For cold or mixed random reads, direct filesystem reads can be competitive or faster.
For warm random reads of small documents, SQLite can be faster.
For whole-document writes, direct filesystem writes were faster in this benchmark.
For full-text search, SQLite FTS5 was dramatically faster than scanning files with rg.

This kind of index can be useful for personal notes, documentation sites, local wikis, or AI-agent memory systems where many Markdown files are searched repeatedly.

Mdlite

You can try this idea with mdlite, a SQLite-based Markdown index.