tcmalloc vs glibc

tcmalloc

By google

Suggest topics

Source Code

Suggest alternative

Edit details

glibc

Unofficial mirror of sourceware glibc repository. Updated daily. (by bminor)

Suggest topics

Source Code

sourceware.org

Suggest alternative

Edit details

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

tcmalloc		glibc
	Project
15	Mentions	45
4,081	Stars	1,213
1.4%	Growth	3.2%
9.8	Activity	9.8
3 days ago	Latest Commit	9 days ago
C++	Language	C
Apache License 2.0	License	GNU General Public License v3.0 or later

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

tcmalloc

Posts with mentions or reviews of tcmalloc. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-04-06.

Configuring HugePages on Google's TCMalloc
1 project | /r/cpp_questions | 25 Jun 2023

https://github.com/google/tcmalloc/issues/190
Configuring HugePages on TCMalloc
1 project | /r/cpp | 24 Jun 2023

I had earlier raised a query on the github.com/google/tcmalloc regarding how I can force tcmalloc to back memory with hugetlbfs instead of using Transparent Huge Pages. I have attached the link to my query below. Please let me know if there is an possible way to do this.
New memory related fields in Yugabyte 2.17.3 pg_stat_activity
1 project | dev.to | 8 May 2023

The allocated_mem_bytes field shows the memory allocated by the memory allocator. PostgreSQL is setup in an extensible way, which includes the ability to choose a memory allocator, which for PostgreSQL is ptmalloc, and for YSQL is tcmalloc. PostgreSQL has the ability to change the memory allocator, but by default uses the operating system memory allocator.
Spotting and Avoiding Heap Fragmentation in Rust Applications
3 projects | news.ycombinator.com | 6 Apr 2023

> * Switching from libc malloc to tcmalloc (dating myself a little bit)
If you think of tcmalloc as an old crusty allocator, you've probably only seen the gperftools version of it.
This is the version Google now uses internally: https://github.com/google/tcmalloc
It's worth a fresh look. In particular, it supports per-CPU caches as an alternative to per-thread caches. Those are fantastic if you have a lot more threads than CPUs.
I've had bad luck with transparent hugepages on my Linux machines
1 project | news.ycombinator.com | 1 Feb 2023

The default setting of max_ptes_none is also problematic.
On a stock kernel, it's 511. TCMalloc's docs recommend using max_ptes_none set to 0 for this reason: https://github.com/google/tcmalloc/blob/master/docs/tuning.m...
(Disclosure: I work on TCMalloc and authored the above doc.)
Pages Are a Good Idea
1 project | news.ycombinator.com | 22 Jan 2023

The easiest way to exploit THP, by far, is to link your program against TCMalloc and forget about it. Literally free money. Highly recommended.
https://github.com/google/tcmalloc
Why tcmalloc using aggresive decommit == false is a litte better than jemalloc
1 project | news.ycombinator.com | 27 Sep 2022
System memory allocator free operation zeroes out deallocated blocks in iOS 16
4 projects | news.ycombinator.com | 22 Sep 2022
malloc() and free() are a bad API
2 projects | /r/C_Programming | 31 Aug 2022

This means that efficient malloc implementation is typically overly complicated. mimalloc for example is almost 8K lines of C afaik, which is one of the smaller but still efficient malloc implementation I'm aware of. (Try looking into tcmalloc for comparison).
malloc global mutex?
2 projects | /r/cpp_questions | 22 Jun 2022

Yes, it is synchronized, you can also swap out the implementation typically. There are different allocators out there depending on what you are trying to optimize for (memory, single thread performance, multithread performance, locality, etc). A lot of multithreading optimized ones use per thread pools, so each individual allocation doesn't need to globally lock, but changing the pools themself does, or large allocations that aren't part of the pools. For example https://github.com/google/tcmalloc

glibc

Posts with mentions or reviews of glibc. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-01-09.

I cut GTA Online loading times by 70% (2021)
3 projects | news.ycombinator.com | 9 Jan 2024
Cray-1 performance vs. modern CPUs
4 projects | news.ycombinator.com | 25 Dec 2023

I wonder if you’re using a different definition of ‘vectorized’ from the one I would use. For example glibc provides a vectorized strlen. Here is the sse version: https://github.com/bminor/glibc/blob/master/sysdeps/x86_64/m...
It’s pretty simple to imagine how to write an unoptimized version: read a vector from the start of the string, compare it to 0, convert that to a bitvector, test for equal to zero, then loop or clz and finish.
I would call this vectorized because it operates on 16 bytes (sse) at a time.
There are a few issues:
1. You’re still spending a lot of time in the scalar code checking loop conditions.
2. You’re doing unaligned reads which are slower on old processors
3. You may read across a cache line forcing you to pull a second line into cache even if the string ends before then.
4. You may read across a page boundary which could cause a segfault if the next page is not accessible
So the fixes are to do 64-byte (ie cache line) aligned accesses which also means page-aligned (so you won’t read from a page until you know the string doesn’t end in the previous page). That deals with alignment problems. You read four vector registers at a time but this doesn’t really cost much more if the string is shorter as it all comes from one cache line. Another trick in the linked code is that it first finds the cache line by reading the first 16 bytes then merging in the next 3 groups with unsigned-min, so it only requires one test against a zero vector instead of 4. Then it finds the zero in the cache line. You need to do a bit of work in the first iteration to become aligned. With AVX, you can use mask registers on reads to handle that first step instead.
Setenv Is Not Thread Safe and C Doesn't Want to Fix It
6 projects | news.ycombinator.com | 19 Nov 2023

That was also my thought. To my knowledge `/etc/localtime` is the creation of Arthur David Olson, the founder of the tz database (now maintained by IANA), but his code never read `/etc/localtime` multiple times unless `TZ` environment variable was changed. Tzcode made into glibc but Ulrich Drepper changed it to not cache `/etc/localtime` when `TZ` is unset [1]; I wasn't able to locate the exact rationale, given that the commit was very ancient (1996-12) and no mailing list archive is available for this time period.
[1] https://github.com/bminor/glibc/commit/68dbb3a69e78e24a778c6...
CTF Writeup: Abusing select() to factor RSA
2 projects | news.ycombinator.com | 11 Nov 2023

That's not really what the problem is. The actual code is fine.
The issue is that the definition of `fd_set` has a constant size [1]. If you allocate the memory yourself, the select() system call will work with as many file descriptors as you care to pass to it. You can see that both glibc [2] and the kernel [3] support arbitrarily large arrays.
[1] https://github.com/bminor/glibc/blob/master/misc/sys/select....
[2] https://github.com/bminor/glibc/blob/master/sysdeps/unix/sys...
[3] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/lin...
How are threads created in Linux x86_64
3 projects | dev.to | 22 Sep 2023

The source code for that is here.
Using Uninitialized Memory for Fun and Profit (2008)
3 projects | news.ycombinator.com | 3 Sep 2023

Expanding macro gives three GCC function attributes [2]: `__attribute__ ((malloc))`, `__attribute__ ((alloc_size(1)))` and `__attribute__ ((warn_unused_result))`. They are required for GCC (and others recognizing them) to actually ensure that they behave as the standard dictates. Your own malloc-like functions won't be treated same unless you give similar attributes.
[1] https://github.com/bminor/glibc/blob/807690610916df8aef17cd1...
[2] https://gcc.gnu.org/onlinedocs/gcc/Common-Function-Attribute...
“csinc”, the AArch64 instruction you didn’t know you wanted
2 projects | news.ycombinator.com | 7 Jun 2023

IFunc relocations is what enables glibc to dynamically choose the best memcpy routine to use at runtime based on the CPU.
see https://github.com/bminor/glibc/blob/glibc-2.31/sysdeps/x86_...
memmove() implementation in strictly conforming C -- possible?
2 projects | /r/C_Programming | 27 Apr 2023

memmove can be very well implemented in pure C, libc implementations usually have a "generic" (meaning, architecture independent) fallback. Here is musl generic implementation and its x86-64 assembly implementation. For glibc, implementation is a bit more complex, having multiple architectures implemented, but you could find a generic implementation with these two files: memmove.c and generic/memcopy.h.
Fedora 38 LLVM vs. Team Fortress 2
6 projects | news.ycombinator.com | 24 Apr 2023

Yeah, looks like the Q_strcat(pszContentPath, "/"); is invalid, as glibc has only allocated exactly enough to fit the path in the buffer returned by realpath().
Interestingly, the open group spec says that a null argument to realpath is "Implementation defined" [0]
And the linux (glibc) man pages say it allocates a buffer "Up to PATH_MAX" [1]
I guess "strlen(path)" is "Up to PATH_MAX", but the man page seems unclear - you could read that as implying the buffer is always allocated to PATH_MAX size, but that's not what seems to be happening, just effectively calling strdup() [2]. I have no idea how to feed back to the linux man pages, but might be worth clarifying there.
[0] https://pubs.opengroup.org/onlinepubs/009696799/functions/re...
[1] https://linux.die.net/man/3/realpath
[2] https://github.com/bminor/glibc/blob/0b9d2d4a76508fdcbd9f421...
Method implementations
2 projects | /r/cpp_questions | 15 Feb 2023

For the actual sources you will have to look at one of the mirrors of the C standard library, such as https://github.com/bminor/glibc/tree/master/sysdeps/ieee754/dbl-64

Compare tcmalloc vs glibc and see what are their differences.

tcmalloc

glibc

tcmalloc

glibc