Rust std:fs slower than Python

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • jemalloc

  • Be aware `jemalloc` will make you suffer the observability issues of `MADV_FREE`. `htop` will no longer show the truth about how much memory is in use.

    * https://github.com/jemalloc/jemalloc/issues/387#issuecomment...

    * https://gitlab.haskell.org/ghc/ghc/-/issues/17411

    Apparently now `jemalloc` will call `MADV_DONTNEED` 10 seconds after `MADV_FREE`:

  • julia

    The Julia Programming Language

  • https://github.com/JuliaLang/julia/issues/51086#issuecomment...

    So while this "fixes" the issue, it'll introduce a confusing time delay between you freeing the memory and you observing that in `htop`.

    But according to https://jemalloc.net/jemalloc.3.html you can set `opt.muzzy_decay_ms = 0` to remove the delay.

    Still, the musl author has some reservations against making `jemalloc` the default:

    https://www.openwall.com/lists/musl/2018/04/23/2

    > It's got serious bloat problems, problems with undermining ASLR, and is optimized pretty much only for being as fast as possible without caring how much memory you use.

    With the above-mentioned tunables, this should be mitigated to some extent, but the general "theme" (focusing on e.g. performance vs memory usage) will likely still mean "it's a tradeoff" or "it's no tradeoff, but only if you set tunables to what you need".

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Presto

    The official home of the Presto distributed SQL query engine for big data

  • Note that glibc has a similar problem in multithreaded contexts. It strands unused memory in thread-local pools, which grows your memory usage over time like a memory leak. We got lower memory usage that didn't grow over time by switching to jemalloc.

    Example of this: https://github.com/prestodb/presto/issues/8993

  • opendal

    Apache OpenDAL: access data freely.

  • Totally unrelated but: this post talks about the bug being first discovered in OpenDAL [1], which seems to be an Apache (Incubator) project to add an abstraction layer for storage over several types of storage backend. What's the point/use case of such an abstraction? Anybody using it?

    [1] https://opendal.apache.org/

  • mimalloc_rust

    A Rust wrapper over Microsoft's MiMalloc memory allocator

  • > I wish Rust would switch to mimalloc or the latest tcmalloc (not the one in gperftools).

    That's nonsensical. Rust uses the system allocators for compatibility, not because they're good (they were not when Rust switched away from jemalloc, and they aren't now).

    If you want to use mimalloc in your rust programs, you can just set it as global allocator, that takes all of three lines: https://github.com/purpleprotocol/mimalloc_rust#usage

  • rust

    Empowering everyone to build reliable and efficient software.

  • > I know it’s easy to change but the arguments for using glibc’s allocator are less clear to me:

    You can find them at the original motivation for removing jemalloc, 7 years ago: https://github.com/rust-lang/rust/issues/36963

    Also it's not "glibc's allocator", it's the system allocator. If you're unhappy with glibc's, get that replaced.

    > 1. Reliability - how is an alternate allocator less reliable?

    Jemalloc had to be disabled on various platforms and architectures, there is no reason to think mimalloc or tcmalloc are any different.

    The system allocator, while shit, is always there and functional, the project does not have to curate its availability across platforms.

    > 2. Compatibility - again sounds like a FUD argument. How is compatibility reduced by swapping out the allocator?

    It makes interactions with anything which does use the system allocator worse, and almost certainly fails to interact correctly with some of the more specialised system facilities (e.g. malloc.conf) or tooling (in rust, jemalloc as shipped did not work with valgrind).

    > Also, most people aren’t writing hello world applications

    Most people aren't writing applications bound on allocation throughput either

    > so the default should probably be for a good allocator.

    Probably not, no.

    > I’d also note that having a dependency of the std runtime on glibc in the first place likely bloats your binary more than the specific allocator selected.

    That makes no sense whatsoever. The libc is the system's and dynamically linked. And changing allocator does not magically unlink it.

    > 4. Maintenance burden - I don’t really buy this argument.

    It doesn't matter that you don't buy it. Having to ship, resync, debug, and curate (cf (1)) an allocator is a maintenance burden. With a system allocator, all the project does is ensure it calls the system allocators correctly, the rest is out of its purview.

  • CPython

    The Python programming language

  • You can look at the history of PyObject yourself: https://github.com/python/cpython/commits/main/Include/objec.... None of these changes were done because of weird CPU errata that meant that making the header bigger was a performance win. That isn't to say that the developers wouldn't be interested in such effects, or be able to detect them, but the fact that the object header happens to be large enough to avoid the performance bug isn't because of careful testing but because that's what they ended up for other reasons, far before Zen 3 was ever released. If it so happened that Python was affected because the offset needed to avoid a penalty was 0x50 or something then I am sure they would take it up with AMD rather than being content to increase the size of their header for no reason.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts