How B+Tree works in databases

Hash tables provide O(1) lookups, so why do most relational databases rely on B+Tree indexes instead? A B+Tree requires O(logN) time to find a record, which seems slower than a hash table at first glance.

Many relational databases use B+Tree indexes because they can efficiently handle real-world access patterns.

Let's consider a single-key lookup. A single-key lookup is a query such as WHERE id = 123.

Hash indexes locate records through a path such as hash(key) -> bucket -> record. As a result, a point lookup takes O(1) time.

Unlike a hash index, a B+Tree must traverse multiple levels of the tree. A search follows a path such as root page -> internal page -> ... -> leaf page. As a result, a point lookup takes O(logN) time.

Therefore, when considering only point lookups, hash indexes have an advantage over B+Trees.

However, databases do not only perform lookups.

Let's consider a range query. A range query is a query such as WHERE id BETWEEN 1000 AND 2000. Hash indexes do not preserve key order. Therefore, a hash index cannot jump to the first key in the range and then scan adjacent keys sequentially. In most cases, a range predicate cannot be served efficiently by a hash index and degrades to a full scan of the table or index, which is O(N).

On the other hand, a B+Tree requires O(logN) time to find the first key. However, once the first leaf page is found, a B+Tree can scan subsequent records by following links between leaf pages. As a result, a range query on a B+Tree takes O(logN + K/P) time, where P is the number of records stored in a page.

The key difference in range queries is that a hash index does not keep keys in sorted order, whereas a B+Tree does. Since each B+Tree leaf page contains multiple sorted records, the I/O cost of a B+Tree range query grows with the number of leaf pages that must be scanned rather than requiring a scan of the entire index.

At this point, you might be concerned about the O(logN) lookup cost of a B+Tree. However, in real-world databases, this cost is surprisingly small. For example, assume that:

page size = 4096 bytes
internal node entry size = 16 bytes

In this case, an internal node can reference approximately

4096 / 16 = 256

child pages.

This branching factor is commonly referred to as the fanout of the tree. If the fanout is 256 and the height of the tree is 4, the B+Tree can index approximately 256^4 ≒ 4.3 billion records. The lookup cost of a B+Tree remains small in practice thanks to its large fanout.

A hash index may perform better for point lookups. However, when range scans and ordered traversals are taken into account, B+Trees provide a more versatile and efficient indexing structure.