Algorithm: Heap

Heap

Here is the definition of a heap:

  • It is a complete binary tree.
  • The parent is always greater (max-heap) or smaller (min-heap) than its children.

A complete binary tree is a binary tree where all levels are fully filled, except possibly the last, which is filled from left to right.

# complete binary tree
        10
       /  \
      7    9
     / \
    3   5


# not complete binary tree
        10
       /
      7
       \
        6

In a max-heap, every parent is greater than or equal to its children. Here is an example:

        10
       /  \
      7    9
     / \
    3   5

- 10 ≥ 7, 9
- 7 ≥ 3, 5

Implementation

Here is a basic interface for a min-heap. We can push a value to the heap, pop the root value, and peek at the current smallest value.

class MinHeap:
    def __init__(self, values=None):

    def push(self, value):

    def pop(self):

    def peek(self):

    def __len__(self):

    def is_empty(self):

While we can implement a heap using a node-based tree structure:

class Node:
  def __init__(self, value):
    self._value = value
    self._left = None
    self._right = None

It is more common to use an array for implementation. Array-based heaps are widely used due to their simplicity and efficiency.

Simplicity: We don't need to store left or right pointers. We simply add a new value to the array and move it to the proper position. Parent and child indices can be calculated directly from the current position:

current_idx (>0)

parent = (current_idx - 1) // 2

children (left) = 2 * current_idx + 1
children (right) = 2 * current_idx + 2 

Efficiency:

  1. Space Complexity: We save memory by not storing pointers for every node.
  2. Cache Locality: Heap values are stored in a single contiguous array. This improves CPU cache hits because the values are loaded near each other in memory.

Adding a new value

To add a new value to the heap, we append the value to the end of the array to maintain the complete binary tree property. Then, we move it up to its correct position (bubble-up).

Example: Adding 2 to a min-heap.

1. Add 2 to the end.
        3
       / \
      5   9
     / \ /
    8 10 2

2. Compare 2 with parent 9. 2 < 9, so swap.
        3
       / \
      5   2
     / \ /
    8 10 9

3. Compare 2 with parent 3. 2 < 3, so swap.
        2
       / \
      5   3
     / \ /
    8 10 9
def _bubble_up(self, idx: int):
    while idx > 0:
        parent = (idx - 1) // 2
        # For MinHeap, if parent > current, swap them
        if self._heap[parent] > self._heap[idx]:
            self._heap[parent], self._heap[idx] = self._heap[idx], self._heap[parent]
            idx = parent
        else:
            break

def push(self, value):
    self._heap.append(value)
    self._bubble_up(len(self._heap) - 1)

In the worst case, the new item traverses from the bottom to the root. Thus, the time complexity is O(log N), where N is the number of nodes.

Pop the root value

To remove the root value (the minimum element) from the heap, we cannot simply remove it from index 0. Doing so would break the tree structure. Instead, we swap the root with the last element, remove the last element, and then move the new root down to its correct position (sift-down).

Example: Popping the root (2) from the heap.

1. Swap root (2) with the last element (9).
        9
       / \
      5   3
     / \ /
    8 10 2

2. Remove the last element (2).
        9
       / \
      5   3
     / \
    8 10

3. Compare 9 with children 5 and 3. 3 is smallest. Swap 9 and 3.
        3
       / \
      5   9
     / \
    8 10
def _sift_down(self, idx: int):
    n = len(self._heap)
    while True:
        left = 2 * idx + 1
        right = left + 1
        smallest = idx

        if left < n and self._heap[left] < self._heap[smallest]:
            smallest = left
        if right < n and self._heap[right] < self._heap[smallest]:
            smallest = right
        
        if smallest == idx:
            break

        self._heap[smallest], self._heap[idx] = self._heap[idx], self._heap[smallest]
        idx = smallest

def pop(self):
    if len(self._heap) == 0:
        raise IndexError("pop from empty heap")

    # 1. Swap the root with the last element
    self._heap[0], self._heap[-1] = self._heap[-1], self._heap[0]
    
    # 2. Remove the last element (original root)
    popped = self._heap.pop()
    
    # 3. Sift down the new root to its proper position
    self._sift_down(0)
    
    return popped

Final implementation

class MinHeap:
    def __init__(self, values=None):
        """Create a min-heap from optional initial values (heapified in-place)."""
        self._heap = [] if values is None else list(values)
        if self._heap:
            try:
                self._heapify()
            except TypeError as exc:
                raise TypeError("Heap items must be mutually comparable") from exc

    def _heapify(self):
        """
        Bottom-up heapify to enforce the heap property in O(n).

        Starts at the last parent index ( (n-2)//2 ) and sifts each node down,
        skipping leaves because they already satisfy the heap property.

        Please check ./Heapify-Computing-Complexity.md for more details.
        """
        for idx in range((len(self._heap) - 2) // 2, -1, -1):
            self._sift_down(idx)

    def _sift_down(self, idx):
        """
        Restore order by pushing the item at idx down to its correct spot.
        Picks the smaller child (for min-heap) to swap with until both children
        are >= current. Example:
          heap = [3, 4, 1] at idx=0 -> swap with child 1 -> [1, 4, 3].
        """
        n = len(self._heap)
        while True:
            left = 2 * idx + 1
            right = left + 1
            smallest = idx
            if left < n and self._heap[left] < self._heap[smallest]:
                smallest = left
            if right < n and self._heap[right] < self._heap[smallest]:
                smallest = right
            if smallest == idx:
                break
            self._heap[idx], self._heap[smallest] = (
                self._heap[smallest],
                self._heap[idx],
            )
            idx = smallest

    def _bubble_up(self, idx):
        """Restore order by bubbling the item at idx up toward the root."""
        while idx > 0:
            parent = (idx - 1) // 2
            if self._heap[idx] < self._heap[parent]:
                self._heap[idx], self._heap[parent] = (
                    self._heap[parent],
                    self._heap[idx],
                )
                idx = parent
            else:
                break

    def _ensure_comparable(self, value):
        """Fail fast if the new value cannot be compared with existing items."""
        if not self._heap:
            return
        try:
            _ = value < self._heap[0]
            _ = self._heap[0] < value
        except TypeError as exc:
            raise TypeError("Heap items must be mutually comparable") from exc

    def push(self, value):
        """Insert a value and restore the heap by bubbling up."""
        self._ensure_comparable(value)
        self._heap.append(value)
        self._bubble_up(len(self._heap) - 1)

    def pop(self):
        """Remove and return the smallest item; raise if empty."""
        if self.is_empty():
            raise IndexError("Heap is empty")
        self._heap[0], self._heap[-1] = self._heap[-1], self._heap[0]
        value = self._heap.pop()
        if self._heap:
            self._sift_down(0)
        return value

    def peek(self):
        """Return the smallest item without removing it; raise if empty."""
        if self.is_empty():
            raise IndexError("Heap is empty")
        return self._heap[0]

    def __len__(self):
        """Return number of items in the heap."""
        return len(self._heap)

    def is_empty(self):
        """Return True when the heap has no items."""
        return len(self._heap) == 0

Time Complexity of _heapify

    def _heapify(self):
        for idx in range((len(self._heap) - 2) // 2, -1, -1):
            self._sift_down(idx)

The time complexity is O(N). This method performs a sift-down operation on all nodes, starting from the last non-leaf node up to the root.

An intuitive explanation is: "Lower nodes are many but move very little. Upper nodes move more but are extremely few."

Level (from bottom)    Node count (approx)      Max work per node
-------------------------------------------------------------------
0  (leaves)            n/2                      0
1                      n/4                      1
2                      n/8                      2
3                      n/16                     3
...                    ...                      ...
H  (root)              1                        H
  • Leaves (≈ n/2 nodes): Require zero work.
  • Level 1 (≈ n/4 nodes): Move at most one step.
  • Level 2 (≈ n/8 nodes): Move at most two steps.
  • Root: Can move H steps, but there is only one root.

This "many-but-cheap vs. few-but-expensive" pattern results in a linear total cost.

We can calculate the total work by summing the work for nodes at each height h:

Total Work = Σ (number of nodes at height h) × (cost per node)

Approximating the number of nodes at height h as n / 2^(h+1) and the cost as h:

T(n) ≈ Σ [h=0 to ∞] (n / 2^(h+1)) * h
T(n) ≈ n * Σ [h=0 to ∞] (h / 2^(h+1))

Let S be the sum:

S = Σ [h=1 to ∞] (h / 2^h)
S = 1/2 + 2/4 + 3/8 + 4/16 + ...

We can solve for S by computing S/2:

S/2 = 1/4 + 2/8 + 3/16 + 4/32 + ...

Subtracting S/2 from S:

S - S/2 = (1/2) + (2/4 - 1/4) + (3/8 - 2/8) + ...
S/2 = 1/2 + 1/4 + 1/8 + ...

The right-hand side is a geometric series that sums to 1. Therefore:

S/2 = 1  =>  S = 2

Plugging this back into our formula for T(n):

T(n) ≈ n * (S/2)
T(n) ≈ n * 1 = n

Thus, heapify runs in O(N) time.