Heap
Here is the definition of a heap:
- It is a complete binary tree.
- The parent is always greater (max-heap) or smaller (min-heap) than its children.
A complete binary tree is a binary tree where all levels are fully filled, except possibly the last, which is filled from left to right.
# complete binary tree
10
/ \
7 9
/ \
3 5
# not complete binary tree
10
/
7
\
6
In a max-heap, every parent is greater than or equal to its children. Here is an example:
10
/ \
7 9
/ \
3 5
- 10 ≥ 7, 9
- 7 ≥ 3, 5
Implementation
Here is a basic interface for a min-heap. We can push a value to the heap, pop the root value, and peek at the current smallest value.
class MinHeap:
def __init__(self, values=None):
def push(self, value):
def pop(self):
def peek(self):
def __len__(self):
def is_empty(self):
While we can implement a heap using a node-based tree structure:
class Node:
def __init__(self, value):
self._value = value
self._left = None
self._right = None
It is more common to use an array for implementation. Array-based heaps are widely used due to their simplicity and efficiency.
Simplicity: We don't need to store left or right pointers. We simply add a new value to the array and move it to the proper position. Parent and child indices can be calculated directly from the current position:
current_idx (>0)
parent = (current_idx - 1) // 2
children (left) = 2 * current_idx + 1
children (right) = 2 * current_idx + 2
Efficiency:
- Space Complexity: We save memory by not storing pointers for every node.
- Cache Locality: Heap values are stored in a single contiguous array. This improves CPU cache hits because the values are loaded near each other in memory.
Adding a new value
To add a new value to the heap, we append the value to the end of the array to maintain the complete binary tree property. Then, we move it up to its correct position (bubble-up).
Example: Adding 2 to a min-heap.
1. Add 2 to the end.
3
/ \
5 9
/ \ /
8 10 2
2. Compare 2 with parent 9. 2 < 9, so swap.
3
/ \
5 2
/ \ /
8 10 9
3. Compare 2 with parent 3. 2 < 3, so swap.
2
/ \
5 3
/ \ /
8 10 9
def _bubble_up(self, idx: int):
while idx > 0:
parent = (idx - 1) // 2
# For MinHeap, if parent > current, swap them
if self._heap[parent] > self._heap[idx]:
self._heap[parent], self._heap[idx] = self._heap[idx], self._heap[parent]
idx = parent
else:
break
def push(self, value):
self._heap.append(value)
self._bubble_up(len(self._heap) - 1)
In the worst case, the new item traverses from the bottom to the root. Thus, the time complexity is O(log N), where N is the number of nodes.
Pop the root value
To remove the root value (the minimum element) from the heap, we cannot simply remove it from index 0. Doing so would break the tree structure. Instead, we swap the root with the last element, remove the last element, and then move the new root down to its correct position (sift-down).
Example: Popping the root (2) from the heap.
1. Swap root (2) with the last element (9).
9
/ \
5 3
/ \ /
8 10 2
2. Remove the last element (2).
9
/ \
5 3
/ \
8 10
3. Compare 9 with children 5 and 3. 3 is smallest. Swap 9 and 3.
3
/ \
5 9
/ \
8 10
def _sift_down(self, idx: int):
n = len(self._heap)
while True:
left = 2 * idx + 1
right = left + 1
smallest = idx
if left < n and self._heap[left] < self._heap[smallest]:
smallest = left
if right < n and self._heap[right] < self._heap[smallest]:
smallest = right
if smallest == idx:
break
self._heap[smallest], self._heap[idx] = self._heap[idx], self._heap[smallest]
idx = smallest
def pop(self):
if len(self._heap) == 0:
raise IndexError("pop from empty heap")
# 1. Swap the root with the last element
self._heap[0], self._heap[-1] = self._heap[-1], self._heap[0]
# 2. Remove the last element (original root)
popped = self._heap.pop()
# 3. Sift down the new root to its proper position
self._sift_down(0)
return popped
Final implementation
class MinHeap:
def __init__(self, values=None):
"""Create a min-heap from optional initial values (heapified in-place)."""
self._heap = [] if values is None else list(values)
if self._heap:
try:
self._heapify()
except TypeError as exc:
raise TypeError("Heap items must be mutually comparable") from exc
def _heapify(self):
"""
Bottom-up heapify to enforce the heap property in O(n).
Starts at the last parent index ( (n-2)//2 ) and sifts each node down,
skipping leaves because they already satisfy the heap property.
Please check ./Heapify-Computing-Complexity.md for more details.
"""
for idx in range((len(self._heap) - 2) // 2, -1, -1):
self._sift_down(idx)
def _sift_down(self, idx):
"""
Restore order by pushing the item at idx down to its correct spot.
Picks the smaller child (for min-heap) to swap with until both children
are >= current. Example:
heap = [3, 4, 1] at idx=0 -> swap with child 1 -> [1, 4, 3].
"""
n = len(self._heap)
while True:
left = 2 * idx + 1
right = left + 1
smallest = idx
if left < n and self._heap[left] < self._heap[smallest]:
smallest = left
if right < n and self._heap[right] < self._heap[smallest]:
smallest = right
if smallest == idx:
break
self._heap[idx], self._heap[smallest] = (
self._heap[smallest],
self._heap[idx],
)
idx = smallest
def _bubble_up(self, idx):
"""Restore order by bubbling the item at idx up toward the root."""
while idx > 0:
parent = (idx - 1) // 2
if self._heap[idx] < self._heap[parent]:
self._heap[idx], self._heap[parent] = (
self._heap[parent],
self._heap[idx],
)
idx = parent
else:
break
def _ensure_comparable(self, value):
"""Fail fast if the new value cannot be compared with existing items."""
if not self._heap:
return
try:
_ = value < self._heap[0]
_ = self._heap[0] < value
except TypeError as exc:
raise TypeError("Heap items must be mutually comparable") from exc
def push(self, value):
"""Insert a value and restore the heap by bubbling up."""
self._ensure_comparable(value)
self._heap.append(value)
self._bubble_up(len(self._heap) - 1)
def pop(self):
"""Remove and return the smallest item; raise if empty."""
if self.is_empty():
raise IndexError("Heap is empty")
self._heap[0], self._heap[-1] = self._heap[-1], self._heap[0]
value = self._heap.pop()
if self._heap:
self._sift_down(0)
return value
def peek(self):
"""Return the smallest item without removing it; raise if empty."""
if self.is_empty():
raise IndexError("Heap is empty")
return self._heap[0]
def __len__(self):
"""Return number of items in the heap."""
return len(self._heap)
def is_empty(self):
"""Return True when the heap has no items."""
return len(self._heap) == 0
Time Complexity of _heapify
def _heapify(self):
for idx in range((len(self._heap) - 2) // 2, -1, -1):
self._sift_down(idx)
The time complexity is O(N). This method performs a sift-down operation on all nodes, starting from the last non-leaf node up to the root.
An intuitive explanation is: "Lower nodes are many but move very little. Upper nodes move more but are extremely few."
Level (from bottom) Node count (approx) Max work per node
-------------------------------------------------------------------
0 (leaves) n/2 0
1 n/4 1
2 n/8 2
3 n/16 3
... ... ...
H (root) 1 H
- Leaves (≈ n/2 nodes): Require zero work.
- Level 1 (≈ n/4 nodes): Move at most one step.
- Level 2 (≈ n/8 nodes): Move at most two steps.
- Root: Can move H steps, but there is only one root.
This "many-but-cheap vs. few-but-expensive" pattern results in a linear total cost.
We can calculate the total work by summing the work for nodes at each height h:
Total Work = Σ (number of nodes at height h) × (cost per node)
Approximating the number of nodes at height h as n / 2^(h+1) and the cost as h:
T(n) ≈ Σ [h=0 to ∞] (n / 2^(h+1)) * h
T(n) ≈ n * Σ [h=0 to ∞] (h / 2^(h+1))
Let S be the sum:
S = Σ [h=1 to ∞] (h / 2^h)
S = 1/2 + 2/4 + 3/8 + 4/16 + ...
We can solve for S by computing S/2:
S/2 = 1/4 + 2/8 + 3/16 + 4/32 + ...
Subtracting S/2 from S:
S - S/2 = (1/2) + (2/4 - 1/4) + (3/8 - 2/8) + ...
S/2 = 1/2 + 1/4 + 1/8 + ...
The right-hand side is a geometric series that sums to 1. Therefore:
S/2 = 1 => S = 2
Plugging this back into our formula for T(n):
T(n) ≈ n * (S/2)
T(n) ≈ n * 1 = n
Thus, heapify runs in O(N) time.