Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
are-we-fast-yet
Are We Fast Yet? Comparing Language Implementations with Objects, Closures, and Arrays
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
I have been using the Boehm-Demers-Weiser GC for jank [1], the native Clojure dialect on LLVM with C++ interop. Since jank is a Clojure dialect, and Clojure is built primarily on persistent, immutable data structures, there are potentially a lot of references to objects, across many threads, and there's a lot of garbage being churned. I originally started with reference counting and RAII, using boost::intrusive_ptr and an atomic count. The GC was actually 2x faster, in the sequence benchmark I was trying to optimize.
At this point, jank is generally several times faster than the equivalent Clojure JVM code. I'm sure there are better GCs out there, in terms of performance, and I have my eye on MMTK [2] for a future upgrade, but the fact that remains is this: the Boehm GC is stupid simple to integrate and it's surprisingly fast. Compare it to MPS, MMTK, and others and both the documentation and the actual dev work required are worlds apart.
For a project which needs a GC but doesn't need to pick the absolute best one first, it seems like the best option, based on my research.
1: https://jank-lang.org/
2: https://www.mmtk.io/code
https://chromium.googlesource.com/v8/v8.git/+/HEAD/include/c...
Due to the nature of web engine workloads migrating objects to being GC'd isn't performance negative (as most people would expect). With care it can often end up performance positive.
There are a few tricks that Oilpan can apply. Concurrent tracing helps a lot (e.g. instead of incrementing/decrementing refs, you can trace on a different thread), in addition when destructing objects, the destructors typically become trivial meaning the object can just be dropped from memory. Both these free up main thread time. (The tradeoff with concurrent tracing is that you need atomic barriers when assigning pointers which needs care).
This is on top of the safey improvements you gain from being GC'd vs. smart pointers, etc.
One major tradeoff that UAF bugs become more difficult to fix, as you are just accessing objects which "should" be dead.
I have a library which has an extremely slow free, around 2m for large files, because of unnaturally scattered allocation patterns, but this old conservative GC didn't help at all. It was about 40% slower with libgc. mimalloc was a bit better. Best would be a properly fast GC, like mps https://github.com/Ravenbrook/mps, but this would be too much work.
The compiler support you need is quite limited. Here's an implementation of cycle collection in Rust: https://github.com/chc4/samsara It's made possible because Rust can tell apart read-only and read-write references (except for interior mutable objects, but these are known to the compiler and references to them can be treated as read-write). This avoids a global stop-the-world for the entire program.
Cascading deletes are rare in practice, and if anything they are inherent to deterministic deletion, which is often a desirable property. When they're possible, one can often use arena allocation to avoid the issue altogether, since arenas are managed as a single object.
> Sure there's a small overhead to smart pointers
Not so small, and it has the potential to significantly speed down an application when not used wisely. Here are e.g. some measurements where the programmer used C++11 and did everything with smart pointers: https://github.com/smarr/are-we-fast-yet/issues/80#issuecomm.... There was a speed down between factor 2 and 10 compared with the C++98 implementation. Also remember that smart pointers create memory leaks when used with circular references, and there is an additional memory allocation involved with each smart pointer.
> Garbage collection has an overhead too of course
The Boehm GC is surprisingly efficient. See e.g. these measurements: https://github.com/rochus-keller/Oberon/blob/master/testcase.... The same benchmark suite as above is compared with different versions of Mono (using the generational GC) and the C code (using Boehm GC) generated with my Oberon compiler. The latter only is 20% slower than the native C++98 version, and still twice as fast as Mono 5.
> Sure there's a small overhead to smart pointers
Not so small, and it has the potential to significantly speed down an application when not used wisely. Here are e.g. some measurements where the programmer used C++11 and did everything with smart pointers: https://github.com/smarr/are-we-fast-yet/issues/80#issuecomm.... There was a speed down between factor 2 and 10 compared with the C++98 implementation. Also remember that smart pointers create memory leaks when used with circular references, and there is an additional memory allocation involved with each smart pointer.
> Garbage collection has an overhead too of course
The Boehm GC is surprisingly efficient. See e.g. these measurements: https://github.com/rochus-keller/Oberon/blob/master/testcase.... The same benchmark suite as above is compared with different versions of Mono (using the generational GC) and the C code (using Boehm GC) generated with my Oberon compiler. The latter only is 20% slower than the native C++98 version, and still twice as fast as Mono 5.
> register scanning isn't portable
Certainly not but it wasn't particularly hard to implement either. I just wrote some inline assembly for every architecture. Here's my programming language's x86_64 and aarch64 implementations:
https://github.com/lone-lang/lone/blob/master/architecture/x...
https://github.com/lone-lang/lone/blob/master/architecture/a...
You can look at the SGCL garbage collector for C++: https://github.com/pebal/sgcl. It works in a separate thread, is locks-free and never stops the world.
Related posts
- Making your project available through Homebrew
- Suggest an interesting language for me to try out, that I can use for 2D Games. Something that I might not have considered, or is not particularly well known.
- Rock v0.2.1, a little native toy language I've made with Rust and LLVM.
- What parsing techniques do you use to support a good language server?
- Understanding tail-call optimization