Optimize: LeetCode Subarray Sum Equals K

Original version

class Solution:
    def subarraySum(self, nums: List[int], k: int) -> int:
        mem = {0: [-1]}
        ans = 0
        total = 0
        for i, n in enumerate(nums):
            total += n
            if total-k in mem:
                ans += len(mem[total-k])
            if total not in mem:
                mem[total] = []
            mem[total].append(i)
        return ans

This version works, but it stores more information than necessary and does extra dictionary work inside the loop.

What is inefficient here?

There are two main issues:

  • We store a list of indices for each prefix sum, even though we only need the count.
  • We do repeated dictionary checks such as if key in dict and then access the same key again.

Refactor idea

1. Store counts, not index lists

We only need to know how many times a prefix sum has appeared. We do not need the actual indices.

So instead of:

  • mem[sum] = [index1, index2, ...]

we can store:

  • mem[sum] = count

That gives us this version:

class Solution:
    def subarraySum(self, nums: List[int], k: int) -> int:
        mem = {0: 1}
        ans = 0
        total = 0
        for n in nums:
            total += n
            if total-k in mem:
                ans += mem[total-k]
            if total not in mem:
                mem[total] = 0
            mem[total] += 1
        return ans

Using an integer counter is cheaper than creating and growing lists.

2. Reduce repeated dictionary lookups

This pattern:

if total not in mem:
    mem[total] = 0
mem[total] += 1

can be simplified to:

mem[total] = mem.get(total, 0) + 1

This is shorter and avoids the separate membership check.

Why dict.get helps

Consider this common pattern:

if key in d:
    value = d[key]

It usually performs two dictionary operations:

  • check whether the key exists
  • fetch the value

By contrast, d.get(key, default) combines that into one access pattern and also removes the Python-level if block.

That makes the loop simpler:

ans += count.get(total - k, 0)
count[total] = count.get(total, 0) + 1

Final version

class Solution:
    def subarraySum(self, nums: List[int], k: int) -> int:
        count = {0: 1}
        total = 0
        ans = 0

        for n in nums:
            total += n
            ans += count.get(total - k, 0)
            count[total] = count.get(total, 0) + 1

        return ans

Summary

A small refactor improves both readability and performance:

  • store counts instead of index lists
  • use dict.get instead of separate membership checks
  • keep the core prefix-sum idea unchanged

The algorithm is still O(N), but the constant factors become smaller.

Appendix: Why can dict.get be faster?

This is not the main algorithmic improvement, but it helps explain why the refactor can still matter in Python.

Python if runs as bytecode

For example:

if x:
    y += 1

Python executes this as bytecode instructions such as:

LOAD_FAST x
POP_JUMP_IF_FALSE L1
LOAD_FAST y
LOAD_CONST 1
INPLACE_ADD
STORE_FAST y
L1:

So a simple if statement is not a single machine-level operation. It goes through multiple interpreter steps such as:

  • loading values
  • checking truthiness
  • jumping to another instruction

In CPython, bytecode is executed by the interpreter loop, so each Python-level operation carries overhead.

dict.get() pushes the work into C

When you write:

count.get(total - k, 0)

it is one Python-level method call, but the actual dictionary lookup happens inside CPython's C implementation.

That means the hash calculation, key lookup, and default handling are all performed in C rather than being split across multiple Python-level steps.

Why that can help

Compare these two patterns:

if total - k in count:
    ans += count[total - k]

and

ans += count.get(total - k, 0)

The first version does:

  • a Python if
  • a dictionary membership check
  • another dictionary access

The second version expresses the same intent more directly and avoids the separate membership test.

General Python performance rule

In many cases, Python gets slower when more work is done at the Python level:

  • Python loops
  • Python conditionals
  • Python function calls
  • repeated attribute or dictionary access

It often gets faster when more work is delegated to built-in functions and library code implemented in C.

That is why small rewrites like dict.get(...) can sometimes improve performance even when the overall algorithm stays the same.