Our great sponsors
-
monokakido
A Rust library for parsing and interpreting the Monokakido dictionary format. Full test coverage and efficient implementation with minimal dependencies.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I'm developing a library and a CLI tool to parse a certain dictionary format: https://github.com/golddranks/monokakido/ (The format of a dictionary app called Monokakido: https://www.monokakido.jp/en/dictionaries/app/ )
Then there are some recent developments like the Atomic memcpy RFC: https://github.com/rust-lang/rfcs/pull/3301 Memory maps aren't specifically mentioned, but they seem relevant. If mmap returning a &[AtomicPerByte] would solve the problem, I'd readily welcome it. Having an actual type to represent the (lack of) guarantees of the memory layout might actually bring some ergonomic benefits too. At the moment, if I go with read_volatile, I'd have to reimplement some basic stuff like string comparison and copying using volatile lookups.
The solution for this, is to either create a temp file and use reflink (note that this is a fork and we haven't released it yet) to create CoW reference to data in the temp file, or use some other mechanisms to prevent others from modifying the file (e.g. setting it to be read-only).
The fst crate effectively relies on mmap for it to work right. The folks here suggesting you just use the heap might be right, but only if using the heap is actually plausible. If your dictionary is GBs big (an FST might be bigger than available memory), then copying it the heap first would be disastrous.
imdb-rename is an example of a tool that memory maps FSTs on disk in order to execute fulltext searches very quickly on the command line.