LeetCode: 8. String to Integer (atoi)

LeetCode: 8. String to Integer (atoi)

LeetCode: 8. String to Integer (atoi)

Summary

Convert a string to a 32-bit signed integer.

Approach

Use a state machine.

We can prepare the following states:

  • BEFORE_SIGN
  • SIGNED

Function

Initialization:

  • state: BEFORE_SIGN
  • sign: 1
  • numbers: []

Main loop:

For each character ch in the given string:

  • If the state is BEFORE_SIGN:
    • if ch is whitespace:
      • continue
    • else if ch is "+" or "-":
      • update state to SIGNED
      • if ch is "-", set sign = -1
    • else if ch is "0-9":
      • update state to SIGNED
      • add ch to numbers
    • else:
      • break # non-digit character
  • If the state is SIGNED:
    • if ch is "0-9":
      • add ch to numbers
    • else:
      • break

Convert numbers to an integer by joining the characters and calling int(). Clamp the result if it is outside the range [-2^31, 2^31 - 1].

Complexity

Time Complexity: O(N) where N is the length of the given string. We traverse each character once, then join and convert the collected digits to an integer. Space Complexity: O(N) to store number characters.

Tips

Use ch.isdigit() rather than ch in string.digits

First, ch in string.digits is O(1) because string.digits has a constant length ("0123456789").

str.isdigit() is also O(1), and it is typically faster because it is optimized in C.

Benchmark results:

Benchmark: ch.isdigit() vs ch in string.digits
data length: 30,000
loops per repeat: 2,000, repeats: 5
best ch.isdigit():          1.300643 sec
best ch in string.digits:   1.649056 sec
speedup (in_digits / isdigit): 1.27x
import string
import timeit


def benchmark_isdigit(data: str) -> int:
    return sum(ch.isdigit() for ch in data)


def benchmark_in_string_digits(data: str) -> int:
    return sum(ch in string.digits for ch in data)


def run_benchmark() -> None:
    # ASCII digits, letters, and symbols mixed.
    data = ("abcXYZ0123456789-_=+!@#$%^&*()" * 1000)
    number = 2000
    repeat = 5
    timer_globals = {
        "benchmark_isdigit": benchmark_isdigit,
        "benchmark_in_string_digits": benchmark_in_string_digits,
        "data": data,
    }

    isdigit_times = timeit.repeat(
        "benchmark_isdigit(data)",
        globals=timer_globals,
        number=number,
        repeat=repeat,
    )
    in_digits_times = timeit.repeat(
        "benchmark_in_string_digits(data)",
        globals=timer_globals,
        number=number,
        repeat=repeat,
    )

    best_isdigit = min(isdigit_times)
    best_in_digits = min(in_digits_times)

    print("Benchmark: ch.isdigit() vs ch in string.digits")
    print(f"data length: {len(data):,}")
    print(f"loops per repeat: {number:,}, repeats: {repeat}")
    print(f"best ch.isdigit():          {best_isdigit:.6f} sec")
    print(f"best ch in string.digits:   {best_in_digits:.6f} sec")
    print(f"speedup (in_digits / isdigit): {best_in_digits / best_isdigit:.2f}x")


if __name__ == "__main__":
    run_benchmark()