Identifying Rust's collect:<Vec<_>>() memory leak footgun

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • aho-corasick

    A fast implementation of Aho-Corasick in Rust.

  • Cross posting my comment from reddit[1] because I think it's interesting.

    -----

    Nice post. I love calling attention to this. Just a few months ago, I ran into the same problem, although it wasn't caused by `collect()`. It was caused by "normal" `Vec` usage:

    https://github.com/BurntSushi/aho-corasick/commit/474393be8d...

    The issue appeared when building large Aho-Corasick automatons. Otherwise, it usually doesn't matter too much because the absolute memory sizes are pretty small. But if your automaton uses 1GB of memory, then because of the excess capacity in a bunch of little `Vec` values, that usage could easily balloon to 2GB. And even at that size, _it was hard to notice that it was more than it should have been_! It is easy to just think that's normal memory usage. Especially when it's coming from the automaton representation that you know is less memory efficient. (What I'm talking about here is something I called a "non-contiguous NFA." The aho-corasick crate also has a "contiguous NFA" and a DFA. The "contiguous NFA" uses one contiguous `Vec` allocation with a bunch of bit-packing tricks.)

    But then someone actually reported an issue on the `ahocorasick_rs`[2] Python project (bindings to the `aho-corasick` crate) that the `pyahocorasick`[3] Python package (with a bespoke C implementation of Aho-Corasick) was using _substantially_ less memory. The contiguous NFA was still doing better, but the _peak_ memory usage of `ahocorasick_rs` was much much bigger than `pyahocorasick`. That prompted me to investigate and figure out what the heck was going wrong. (And this is one reason why benchmarks comparing tools or libraries are actually useful beyond just advertisements or flexing. They alert you to what is possible, and thus possibly push you to find bugs in your current implementation that might be otherwise easy to miss. In other words, benchmark comparisons can turn unknown unknowns into known unknowns.)

    So since this prompted me to look very carefully at memory usage, I noticed that the memory usage reported by `AhoCorasick::memory_usage`[4] was substantially smaller than peak RSS memory usage in a simple reproduction program of the original issue reported to `ahocorasick_rs`. I eventually figured out that was because of the excess capacity used by a zillion tiny `Vec`s in the non-contiguous representation. I fixed _most_ of that by calling `shrink_to_fit()`. But as the commit linked above notes, it wasn't really feasible to call `shrink_to_fit()` on all `Vec`s because that ended up with a non-trivial impact on construction time. And on top of all of that, memory usage was still worse than `pyahocorasick`.

    But why was I using a bunch of little `Vec`s in the first place? It's because this is the "non-contiguous" NFA. This is the thing that gets built from a list of patterns by first building a trie then filling in the failure transitions. That trie construction really demands random access mutation, which puts a constraint on your data structure choices. Plus, I had designed the contiguous NFA as a release valve of sorts that lifts this constraint. So I reasoned, "sure, the non-contiguous NFA isn't designed to be fast. It just needs to get us started." But still, `pyahocorasick` only has one Aho-Corasick implementation (the `aho-corasick` crate has 3). So it must be doing something differently.

    It turns out that `pyahocorasick` uses linked lists! THE HORROR! But actually, they are a really good solution to this problem. As a Rust programmer, I reach for `Vec` first. But a C programmer reaches for linked lists first. And it turns out that linked lists are a much better fit here.

    And so, I switched to linked lists[5]. (But using indices instead of pointers. Because of course. This is Rust. :-)) And now `ahocorasick_rs` uses less memory than `pyahocorasick`!

    [1]: https://old.reddit.com/r/rust/comments/199jycb/identifying_r...

    [2]: https://pypi.org/project/ahocorasick-rs/

    [3]: https://pypi.org/project/pyahocorasick/

    [4]: https://docs.rs/aho-corasick/latest/aho_corasick/struct.AhoC...

    [5]: https://github.com/BurntSushi/aho-corasick/commit/31bb1fc30d...

  • rust

    Empowering everyone to build reliable and efficient software.

  • This is a good time to check how your code performs in beta vs stable. This particular case is new in beta and it would be interesting to catch regressions before they land.

    Someone already filed this as a bug: https://github.com/rust-lang/rust/issues/120091

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts