Many Small Writes to a File in Rust

Context

I'm writing a simple database and implementing file-writing logic.

I searched about the file API implementation and ran some benchmarks to test the performance and understand the mechanism.

Now, I understand the basic flow of file I/O in Rust and the tradeoffs around I/O performance.

Let me share my learnings.

What happens when you call write_all?

When you call write_all for a File instance, it writes data into the OS page cache through a system call such as write(2), but not to disk immediately. If a single call cannot write the entire buffer, write_all may call the underlying write operation multiple times until all bytes are written or an error occurs. The data is synced to disk asynchronously by the OS. The data is in the OS page cache, so other processes can also see the updates in the file, because they read from the same OS page cache.

Rust write_all call
-> system call: write data into the OS page cache

OS:
-> sync data from the cache to disk asynchronously

This is because it would degrade disk I/O performance if every write were forced to hit the disk immediately. For example, if there is a large batch of small records and you call write_all for each record, each write usually crosses into the kernel. That causes a flood of system calls.

Stdout uses a global buffer

File I/O does not have an internal buffer for the write API, but stdout has one. As documented:

Each handle shares a global buffer of data to be written to the standard output stream. Access is also synchronized via a lock and explicit control over locking is available via the lock method.

https://doc.rust-lang.org/std/io/struct.Stdout.html

Now you may understand why it has a shared buffer: buffering reduces system calls. If we use a buffer, the flow is:

Rust stdout write
-> write data to a buffer in the Rust runtime, no system call yet
-> flush buffered data to the OS page cache

The buffer keeps the data in memory, so each small write does not immediately execute a system call. Once flush is used, a system call is triggered and the buffered data is sent to the underlying file descriptor.

File I/O does not use a buffer by default, but you can still call the flush API like stdout. However, it does nothing for File as documented:

Since a File structure doesn’t contain any buffers, this function is currently a no-op on Unix and Windows. Note that this may change in the future.

https://doc.rust-lang.org/beta/std/fs/struct.File.html#method.flush

I'll explain how to use a buffer for file I/O and why later.

Can we immediately write our data to disk?

Yes. You can use the sync_all API.

Attempts to sync all OS-internal file content and metadata to disk.

https://doc.rust-lang.org/beta/std/fs/struct.File.html#method.sync_all

It's important to use this API for data-intensive applications like databases. Data can be lost when the machine crashes while the data still exists in the OS page cache but has not been synced to disk.

Rust also has sync_data, which is similar but may avoid syncing some metadata. If your application needs metadata such as file length or directory entries to be durable, sync_all is the stronger choice.

Performance: File writes vs BufWriter

Here's a more realistic situation. When you write a log collector application, the application collects many logs via network and saves them into files. How do you structure the writer? Do you write directly to File, or do you use a buffer? Let's test it.

Conditions:

  • Total data size is 100 MiB.
  • We use different chunk size, 1 byte, 100B, 1KiB, 8KiB
  • single-threaded sequential writing

Results:

MethodRecord sizewrite_all callsSizeElapsedThroughput
File1 B104857600100.0 MiB125.986 s0.8 MiB/s
BufWriter1 B104857600100.0 MiB0.199 s501.8 MiB/s
File100 B1048576100.0 MiB1.282 s78.0 MiB/s
BufWriter100 B1048576100.0 MiB0.079 s1259.1 MiB/s
File1 KiB102400100.0 MiB0.204 s490.6 MiB/s
BufWriter1 KiB102400100.0 MiB0.105 s950.4 MiB/s
File8 KiB12800100.0 MiB0.075 s1332.5 MiB/s
BufWriter8 KiB12800100.0 MiB0.071 s1410.0 MiB/s

See more

The 1 B case is not realistic, but you can see that writing with a buffer performs much better than writing directly to File. This is because buffering reduces the number of system calls.

However, you need to consider the tradeoff between performance and data safety/consistency. Using a buffer means that data remains in Rust runtime memory for a while. If the server crashes before the data is flushed to the OS, it is lost. Even after flushing to the OS, the data can still be lost on a machine crash unless it is synced to disk.

Conclusion

I learned how generally writing data to a file works in an application. And I confirmed there's a bottleneck of system calls when we handle many small writes into a single file. Using a buffer improves performance, but there's a tradeoff you need to consider.