direct-io
imdb-rename
direct-io | imdb-rename | |
---|---|---|
1 | 6 | |
66 | 221 | |
- | - | |
0.0 | 6.2 | |
about 1 year ago | 2 months ago | |
C | Rust | |
MIT License | The Unlicense |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
direct-io
-
But How, Do Databases Use Mmap?
I wrote this for Node.js, which is a native binding in C, exposing cross platform functionality: https://github.com/ronomon/direct-io
Although if it's a new project and you're used to C, I would recommend also taking a good look at Zig (https://ziglang.org/), because it's so explicit about alignment compared to C, and makes alignment a first-class part of the type system, see this other comment of mine that goes into more detail: https://news.ycombinator.com/item?id=25801542
Something that will also help, is setting your minimum IO unit to 4096 bytes, the Advanced Format sector size, because then your Direct IO system will just work, regardless of whether sysadmins swap disks of different sector sizes from underneath you. For example, a minimum sector size of 4096 bytes will work not only for newer AF disks but also for any 512 byte sector disks.
Lastly, Direct IO is actually more a property of the file system, not necessarily the OS (e.g. Linux), so you will some file systems on Linux that return EINVAL when you try to open a file descriptor with O_DIRECT, simply because they don't support O_DIRECT (e.g. a macOS volume accessed from within a Linux VM) so that should be your way of testing for support, not only the OS.
imdb-rename
- IMDB-rename: A command line tool to rename media files based on titles from IMDB
-
my rarbg magnet backup (268k)
I wrote a tool that did something related a while back using IMDb data: https://github.com/BurntSushi/imdb-rename
-
Projects in rust
This might be of interest: https://github.com/BurntSushi/imdb-rename
-
The technology behind GitHub’s new code search
What a shit take. The article itself is perhaps a nice light overview of 101-ish level concepts, although knowing how and when to apply them in a real engineering context is not something I would consider 101 level. And certainly, building something that is actually at the scale of GitHub Search is nowhere near 101 level.
This is what a 101-level inverted index implementation looks like: https://github.com/BurntSushi/imdb-rename
In other words, absolutely nothing like what GitHub built. Nowhere close.
-
How to use mmap safely in Rust?
imdb-rename is an example of a tool that memory maps FSTs on disk in order to execute fulltext searches very quickly on the command line.
-
But How, Do Databases Use Mmap?
> How else would you lazy-load a database of (say) 32GB into memory, almost instantly?
That's what the fst crate[1] does. It's likely working at a lower level of abstraction than you intend. But the point is that it works, is portable and doesn't require any cooperation from the OS other than the ability to memory map files. My imdb-rename tool[2] uses this technique to build an on-disk database for instantaneous searching. And then there is the regex-automata crate[3] that permits deserializing a regex instantaneously from any kind of slice of bytes.[4]
I think you should maybe provide some examples of what you're suggesting to make it more concrete.
[1] - https://crates.io/crates/fst
[2] - https://github.com/BurntSushi/imdb-rename
[3] - https://crates.io/crates/regex-automata
[4] - https://docs.rs/regex-automata/0.1.9/regex_automata/#example...
What are some alternatives?
httpdirfs - A filesystem which allows you to mount HTTP directory listings or a single file, with a permanent cache. Now with Airsonic / Subsonic support!
project-safe-transmute - Project group working on the "safe transmute" feature
wg-allocators - Home of the Allocators working group: Paving a path for a standard set of allocator traits to be used in collections!
stack-graphs - Rust implementation of stack graphs
hh-suite - Remote protein homology detection suite.
lsif-clang - Language Server Indexing Format (LSIF) generator for C, C++ and Objective C
libuv - Cross-platform asynchronous I/O
textscanner
MMseqs2 - MMseqs2: ultra fast and sensitive search and clustering suite