Memory mapping files
Every now and again I dig up my old MUD language (AMUL /SMUGL) and tinker with the source code. Some time last year I used it to explore various optimization/profiling tools and found a large portion of the compilation process was taken up with simple disk IO, and almost all of it on reads: I’d found myself an excuse to experiment with mmap().
I quickly found that while Windows doesn’t support mmap() but it provides its own, in some ways superior, MapViewOfFile. Ultimately, both systems return you a pointer to address space where the file’s contents will magically appear in memory for you without needing to call read() etc.
I was pleasantly surprised by how easy it was to use both systems, and they are similar enough that I was able to do so while building a simple “MappedFile” C++ class wrapper for the process. For source, see http://www.kfs.org/oliver/code/io_mapped_file/ - there’s also a Linux-based mmap() vs read() comparison, and a poor-man’s grep/find example app.
Both mmap and MapViewOfFile can deliver huge performance boosts to disk-IO hungry applications by allowing you to bypass the copy-to-local-buffer step of invoking read().
Most of us abstract the notion of “read()” as being the system call that “fetches data from the disk” (in this particular context). Read can cause the retrieval of data from disk, but the data will be copied into disk-cache buffers managed by the operating system.
Read is usually wrapped with one of three approaches to pulling data from a file:
- Use a small stack buffer and read the file in chunk at a time. No memory management overhead, loop until read says end of file or nothing left. Great method when working with reads of fixed-sized blobs of data from a file, terrible when working with stuff like lines of text.
- High memory bandwidth: Continuously copying memory from kernel space to your little buffer or,
- Cache entropy: both locations become cache “hot” forcing other stuff out of cache,
- Under load, can double the number of page faults.
- Get the size of the file, allocate sufficient memory, and read the whole file at once, usually when the files are very big.
- Resource contention: May decrease memory available to the OS for disk buffering,
- 2x+ the number of page faults: +1 for each OS-cache page populated, +1 for each copy to your memory, +1 for each access by your application,
- Disk-cache contention: If you’re loading a particularly large file, the OS may reach a point where it detects areas of your allocation populated some pages ago as being “cold” and store them to virtual memory — i.e. disk
- Combination of 1 & 2: read into a static buffer, copy to dynamically allocated and grown memory if we fill the buffer.
- World’o'hurt: read, realloc, copy, (realloc == malloc, memcpy, free), read, realloc, copy, animate hour glass, …
In most OSes, copying data from kernel space to userland carries an extra overhead that most programmers overlook/are unaware of. The mmap/MapViewOfFile mechanisms provide you with a pointer to a virtual space of memory – that is, the addresses you’ll access aren’t real, but the CPU will translate them in hardware to the real physical, absolute location of where the OS stored data in it’s cache-buffer (this strategy is actually what’s behind read() and friends in the first place, so you really are just cutting out the middleman, i.e. read() and your destination-buffer copy).
Mmap/MapViewOfFile is not always the solution: The virtual address part makes it less-than-free. The data is accessible from user-space, but it is not in your application’s memory space.
If you’re going to be repeatedly random-accessing a small file in a relatively tight loop, say a 64-256Kb, then it’s probably better to reap the benefit of accessing the data in user-space by allocate-and-loading.