jemalloc
gperftools
jemalloc | gperftools | |
---|---|---|
42 | 4 | |
9,602 | 8,498 | |
0.9% | 0.6% | |
8.7 | 9.4 | |
5 days ago | 13 days ago | |
C | C++ | |
GNU General Public License v3.0 or later | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
jemalloc
-
Show HN: Rust Web Framework
jemalloc (as opposed to GNU libc and LLVM) sometimes performs better. [1]
[1] https://jemalloc.net/
-
Training AI Models on CPU on AWS EC2
There are a number of opportunities for optimizing the use of the underlying CPU resources. These include optimizing memory management and thread allocation to the structure of the underlying CPU hardware. Memory management can be improved through the use of advanced memory allocators (such as Jemalloc and TCMalloc) and/or reducing memory accesses that are slower (i.e., across NUMA nodes). Threading allocation can be improved through appropriate configuration of the OpenMP threading library and/or use of Intel's Open MP library.
-
Adding 16 KB Page Size to Android
Certain build processes determine the page size at compile time and assume it's the same at run time, and fail if it is not: https://github.com/jemalloc/jemalloc/issues/467
Some memory-mapped files formats have assumptions about page granularity: https://bugzilla.redhat.com/show_bug.cgi?id=1979804
The file format issue applies to ELF as well. Some people patch their toolchains (or use suitable linker options) to produce slightly smaller binaries that can only be loaded if the page size is 4K, even though the ABI is pretty clear in that you should link for compatibility with up to 64K pages.
-
Mimalloc Cigarette: Losing one week of my life catching a memory leak (Rust)
jemalloc has its own problem with threads - if you have a multi-threaded application that uses jemalloc on all threads except the main thread, then the cleanup that jemalloc runs on main thread exit will segfault. In $dayjob we use jemalloc as a sub-allocator in specific arenas (the rest of the application uses libc malloc, but for some cases it allocates pages using mmap and then uses jemalloc to partition them). $dayjob's code is written in Rust, whose unit test framework defaults to running tests in one or more threads and the main thread of the test binary just orchestrates them. So the test binary triggers this segfault reliably.
( https://github.com/jemalloc/jemalloc/issues/1317 Unlike what the title says, it's not Windows-specific.)
-
Resource observability case study: jemalloc in Android builds
As a demonstration, I want to measure something that caught my attention months ago—an interesting topic brought up by Jason Pearson: the use of jemalloc as a native memory allocator for Android builds. The initial claim is that this usage brings a reduction in memory usage by optimizing how memory is allocated and deallocated. jemalloc is designed to minimize memory fragmentation and improve performance, particularly in multithreaded applications, making it ideal for resource-intensive builds.
- Userland Rootkits Are Lame
- Show HN: Comprehensive inter-process communication (IPC) toolkit in modern C++
-
Finding memory leaks in Postgres C code
jemalloc as well has some handy leak / memory profiling abilities: https://github.com/jemalloc/jemalloc/wiki/Use-Case%3A-Heap-P...
-
Speed of Rust vs. C
The worst memory performance bug I ever saw turned out to be heap fragmentation in a non-GC system. There are memory allocators that solve this like https://github.com/jemalloc/jemalloc/tree/dev but ... they do it by effectively running a GC at the block level
As soon as you use atomic counters in a multi-threaded system you can wave goodbye to your scalability too!
-
Understanding Mesh Allocator
The linked talk video mentioned they're playing with it in jemalloc and tcmalloc.
I found this https://github.com/jemalloc/jemalloc/issues/1440 but couldn't find tcmalloc doing similar.
These guys are aware of mesh and compare against it: https://abelay.github.io/6828seminar/papers/maas:llama.pdf
gperftools
-
I find it's not possible to do serious C/C++ coding on latest macOS
For profiling you are right clang has no -pg that works. But there are options, since clang supports PGO the fprofile flags could be what you need. they will generated a profraw file for you. There is also gperf tools which work for more than just linux. https://github.com/gperftools/gperftools
-
Why So Slow? Using Profilers to Pinpoint the Reasons of Performance Degradation
Because we couldn't identify the issue using the results we got from Callgrind, we reached for another profiler, gperftools. It's a sampling profiler and therefor it has a smaller impact on the application's performance in exchange for less accurate call statistics. After filtering out the unimportant parts and visualizing the rest with pprof, it was evident that something strange was happening with the send function. It took only 71 milliseconds with the previous implementation and more than 900 milliseconds with the new implementation of our Bolt server. It was very suspicious, but based on Callgrind, its cost was almost the same as before. We were confused as the two results seemed to conflict with each other.
-
Is there a way I can visualize all the function calls made while running the project(C++) in a graphical way?
gprftools (https://github.com/gperftools/gperftools) can be easily plugged in using LD_PRELOAD and signal, and has nice go implemented visualization tool https://github.com/google/pprof.
-
How do applications request for RAM from the CPU?
Google's tcmalloc
What are some alternatives?
mimalloc - mimalloc is a compact general purpose allocator with excellent performance.
pprof - pprof is a tool for visualization and analysis of profiling data
tbb - oneAPI Threading Building Blocks (oneTBB) [Moved to: https://github.com/oneapi-src/oneTBB]
massif-visualizer - Visualizer for Valgrind Massif data files
rust-scudo
rpmalloc - Public domain cross platform lock free thread caching 16-byte aligned memory allocator implemented in C
tracy - Frame profiler
Hoard - The Hoard Memory Allocator: A Fast, Scalable, and Memory-efficient Malloc for Linux, Windows, and Mac.
minitrace - Simple C/C++ library for producing JSON traces suitable for Chrome's built-in trace viewer (about:tracing).
Redis - Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.
gprof2dot - Converts profiling output to a dot graph.