But How, Do Databases Use Mmap?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • imdb-rename

    A command line tool to rename media files based on titles from IMDb.

  • > How else would you lazy-load a database of (say) 32GB into memory, almost instantly?

    That's what the fst crate[1] does. It's likely working at a lower level of abstraction than you intend. But the point is that it works, is portable and doesn't require any cooperation from the OS other than the ability to memory map files. My imdb-rename tool[2] uses this technique to build an on-disk database for instantaneous searching. And then there is the regex-automata crate[3] that permits deserializing a regex instantaneously from any kind of slice of bytes.[4]

    I think you should maybe provide some examples of what you're suggesting to make it more concrete.

    [1] - https://crates.io/crates/fst

    [2] - https://github.com/BurntSushi/imdb-rename

    [3] - https://crates.io/crates/regex-automata

    [4] - https://docs.rs/regex-automata/0.1.9/regex_automata/#example...

  • project-safe-transmute

    Project group working on the "safe transmute" feature (by jswrenn)

  • It's likely that the "safe transmute" working group[1] will help facilitate this sort of thing. They have an RFC[2]. See also the bytemuck[3] and zerocopy[4] crates which predate the RFC, where at least the latter has 'derive' functionality.

    [1] - https://github.com/rust-lang/project-safe-transmute

    [2] - https://github.com/jswrenn/project-safe-transmute/blob/rfc/r...

    [3] - https://docs.rs/bytemuck/1.5.0/bytemuck/

    [4] - https://docs.rs/zerocopy/0.3.0/zerocopy/index.html

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • httpdirfs

    A filesystem which allows you to mount HTTP directory listings, with a permanent cache. Now with Airsonic / Subsonic support!

  • You are correct, this works. There even is a file system built around this: https://github.com/fangfufu/httpdirfs

  • wg-allocators

    Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!

  • Work on custom allocators is underway, some of the std data structures already support them on nightly.

    https://github.com/rust-lang/wg-allocators/issues/7

  • direct-io

    Direct IO helpers for block devices and regular files on FreeBSD, Linux, macOS and Windows.

  • I wrote this for Node.js, which is a native binding in C, exposing cross platform functionality: https://github.com/ronomon/direct-io

    Although if it's a new project and you're used to C, I would recommend also taking a good look at Zig (https://ziglang.org/), because it's so explicit about alignment compared to C, and makes alignment a first-class part of the type system, see this other comment of mine that goes into more detail: https://news.ycombinator.com/item?id=25801542

    Something that will also help, is setting your minimum IO unit to 4096 bytes, the Advanced Format sector size, because then your Direct IO system will just work, regardless of whether sysadmins swap disks of different sector sizes from underneath you. For example, a minimum sector size of 4096 bytes will work not only for newer AF disks but also for any 512 byte sector disks.

    Lastly, Direct IO is actually more a property of the file system, not necessarily the OS (e.g. Linux), so you will some file systems on Linux that return EINVAL when you try to open a file descriptor with O_DIRECT, simply because they don't support O_DIRECT (e.g. a macOS volume accessed from within a Linux VM) so that should be your way of testing for support, not only the OS.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts