LeetCode: 8. String to Integer (atoi)
LeetCode: 8. String to Integer (atoi)
Summary
Convert a string to a 32-bit signed integer.
Approach
Use a state machine.
We can prepare the following states:
- BEFORE_SIGN
- SIGNED
Function
Initialization:
- state: BEFORE_SIGN
- sign: 1
- numbers: []
Main loop:
For each character ch in the given string:
- If the state is BEFORE_SIGN:
- if
chis whitespace:- continue
- else if
chis"+"or"-":- update state to SIGNED
- if
chis"-", setsign = -1
- else if
chis"0-9":- update state to SIGNED
- add
chtonumbers
- else:
- break # non-digit character
- if
- If the state is SIGNED:
- if
chis"0-9":- add
chtonumbers
- add
- else:
- break
- if
Convert numbers to an integer by joining the characters and calling int().
Clamp the result if it is outside the range [-2^31, 2^31 - 1].
Complexity
Time Complexity: O(N) where N is the length of the given string. We traverse each character once, then join and convert the collected digits to an integer. Space Complexity: O(N) to store number characters.
Tips
Use ch.isdigit() rather than ch in string.digits
First, ch in string.digits is O(1) because string.digits has a constant length ("0123456789").
str.isdigit() is also O(1), and it is typically faster because it is optimized in C.
Benchmark results:
Benchmark: ch.isdigit() vs ch in string.digits
data length: 30,000
loops per repeat: 2,000, repeats: 5
best ch.isdigit(): 1.300643 sec
best ch in string.digits: 1.649056 sec
speedup (in_digits / isdigit): 1.27x
import string
import timeit
def benchmark_isdigit(data: str) -> int:
return sum(ch.isdigit() for ch in data)
def benchmark_in_string_digits(data: str) -> int:
return sum(ch in string.digits for ch in data)
def run_benchmark() -> None:
# ASCII digits, letters, and symbols mixed.
data = ("abcXYZ0123456789-_=+!@#$%^&*()" * 1000)
number = 2000
repeat = 5
timer_globals = {
"benchmark_isdigit": benchmark_isdigit,
"benchmark_in_string_digits": benchmark_in_string_digits,
"data": data,
}
isdigit_times = timeit.repeat(
"benchmark_isdigit(data)",
globals=timer_globals,
number=number,
repeat=repeat,
)
in_digits_times = timeit.repeat(
"benchmark_in_string_digits(data)",
globals=timer_globals,
number=number,
repeat=repeat,
)
best_isdigit = min(isdigit_times)
best_in_digits = min(in_digits_times)
print("Benchmark: ch.isdigit() vs ch in string.digits")
print(f"data length: {len(data):,}")
print(f"loops per repeat: {number:,}, repeats: {repeat}")
print(f"best ch.isdigit(): {best_isdigit:.6f} sec")
print(f"best ch in string.digits: {best_in_digits:.6f} sec")
print(f"speedup (in_digits / isdigit): {best_in_digits / best_isdigit:.2f}x")
if __name__ == "__main__":
run_benchmark()