Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
httpdirfs
A filesystem which allows you to mount HTTP directory listings, with a permanent cache. Now with Airsonic / Subsonic support!
-
wg-allocators
Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!
-
direct-io
Direct IO helpers for block devices and regular files on FreeBSD, Linux, macOS and Windows.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
> How else would you lazy-load a database of (say) 32GB into memory, almost instantly?
That's what the fst crate[1] does. It's likely working at a lower level of abstraction than you intend. But the point is that it works, is portable and doesn't require any cooperation from the OS other than the ability to memory map files. My imdb-rename tool[2] uses this technique to build an on-disk database for instantaneous searching. And then there is the regex-automata crate[3] that permits deserializing a regex instantaneously from any kind of slice of bytes.[4]
I think you should maybe provide some examples of what you're suggesting to make it more concrete.
[1] - https://crates.io/crates/fst
[2] - https://github.com/BurntSushi/imdb-rename
[3] - https://crates.io/crates/regex-automata
[4] - https://docs.rs/regex-automata/0.1.9/regex_automata/#example...
It's likely that the "safe transmute" working group[1] will help facilitate this sort of thing. They have an RFC[2]. See also the bytemuck[3] and zerocopy[4] crates which predate the RFC, where at least the latter has 'derive' functionality.
[1] - https://github.com/rust-lang/project-safe-transmute
[2] - https://github.com/jswrenn/project-safe-transmute/blob/rfc/r...
[3] - https://docs.rs/bytemuck/1.5.0/bytemuck/
[4] - https://docs.rs/zerocopy/0.3.0/zerocopy/index.html
You are correct, this works. There even is a file system built around this: https://github.com/fangfufu/httpdirfs
Work on custom allocators is underway, some of the std data structures already support them on nightly.
https://github.com/rust-lang/wg-allocators/issues/7
I wrote this for Node.js, which is a native binding in C, exposing cross platform functionality: https://github.com/ronomon/direct-io
Although if it's a new project and you're used to C, I would recommend also taking a good look at Zig (https://ziglang.org/), because it's so explicit about alignment compared to C, and makes alignment a first-class part of the type system, see this other comment of mine that goes into more detail: https://news.ycombinator.com/item?id=25801542
Something that will also help, is setting your minimum IO unit to 4096 bytes, the Advanced Format sector size, because then your Direct IO system will just work, regardless of whether sysadmins swap disks of different sector sizes from underneath you. For example, a minimum sector size of 4096 bytes will work not only for newer AF disks but also for any 512 byte sector disks.
Lastly, Direct IO is actually more a property of the file system, not necessarily the OS (e.g. Linux), so you will some file systems on Linux that return EINVAL when you try to open a file descriptor with O_DIRECT, simply because they don't support O_DIRECT (e.g. a macOS volume accessed from within a Linux VM) so that should be your way of testing for support, not only the OS.
Related posts
- Mount virtual http[s] iso command for progressive adaptive random access download with optional resumable download going to storage?
- How to browse a http archive from terminal?
- Use HTTP Remote with single file link
- I implemented the "Single File Mode" in HTTPDirFS, so you can now mount any arbitrary file served by a HTTP server in a virtual directory.
- I implemented the "Single File Mode" in HTTPDirFS, so you can now mount any arbitrary file served by a HTTP server in a virtual directory.