LeetCode: 139. Word Break

Summary

Given a string s and a dictionary wordDict, determine whether s can be segmented into a space-separated sequence of one or more dictionary words.
Each word can be reused multiple times.

Approach

Use a trie and track multiple active matching states while scanning s.

We cannot follow only one path because words can share prefixes (for example, "app", "apple", and "application").
So at each character, we keep all possible trie nodes that represent valid partial matches.

At the end of the scan, if any active state is at a terminal trie node, the answer is true.

Algorithm

Initialization:

trie_root = Build a trie from wordDict.
states = [trie_root] (active trie nodes after consuming the current prefix of s).

Main Loop:

For each character ch in s:

Initialize:
- visited as a set of node IDs to avoid duplicate nodes in this step.
- new_states as an empty array.
For each node in states:
- If the node has child ch, move to that child and add it to new_states if not visited.
- If the current node is terminal (end of a word), we can start a new word:
  - If trie_root has child ch, add trie_root[ch] to new_states if not visited.
Replace states with new_states.
Also start from the beginning of s:
- If trie_root has child ch, include it in new_states for the first character (or whenever no active path exists yet).

Return true if any node in states is terminal after the final character; otherwise return false.

Example:

s = "leetcode", wordDict = ["leet", "code"]

trie = { l: { e: ... }, c: { o: ... } }
states = []

states = [root['l']]
states = [root['l']['e']]
...
states = [root['l']['e']['e']['t']] (terminal: "leet")
Start new word from root with 'c': states = [root['c']]
...
states = [root['c']['o']['d']['e']] (terminal: "code")

Complexity

Time complexity: O(NK + SW)

N: number of words in wordDict
K: average word length
S: length of s
W: maximum word length in wordDict

The NK term is for building the trie.
The SW term is for scanning s while tracking up to W active states in the worst case.

Worst case for len(states):

example: wordDict = ['a', 'aa', 'aaa', 'aaaa'], s = 'aaaaa'
then states evolve like:
- 1st loop: states = [root['a']]
- 2nd loop: states = [root['a'], root['a']['a']]
- 3rd loop: states = [root['a'], root['a']['a'], root['a']['a']['a']]
- 4th loop: states = [root['a'], root['a']['a'], root['a']['a']['a'], root['a']['a']['a']['a']]
- 5th loop: same as 4th (no further growth)

Space complexity: O(NK) for storing the trie (plus O(W) for active states).