llvm-propeller
BOLT
Our great sponsors
llvm-propeller | BOLT | |
---|---|---|
6 | 10 | |
332 | 2,487 | |
0.9% | - | |
0.0 | 0.0 | |
7 months ago | about 1 year ago | |
Shell | C++ | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
llvm-propeller
-
Speed of Rust vs. C
> In theory, Rust allows even better optimizations than C thanks to stricter immutability and aliasing rules, but in practice this doesn't happen yet. Optimizations beyond what C does are a work-in-progress in LLVM, so Rust still hasn't reached its full potential.
Really glad to see this mentioned, I search for it in every post like this.
LLVM should follow now that the mutable noalias bugs appear to be fixed. Remember when it was a meme that every time it gets enabled in rustc, it has to be disabled again in the next .1 release? It's been left enabled for years.
The biggest challenge here is that most people aren't looking for optimizations beyond what they were already getting with C. Like when Google was switching from GCC to Clang, they'd pick apart the asm and report bugs for Clang and LLVM to converge on GCC. That was a big part of why the performance gap closed over the years. But that was closing a gap, not speculatively overtaking them.
Then there's a whole other angle; we now know just how much machine code layout can affect real-world performance, in most cases much more than aliasing rules, but we still haven't made it a seamless part of our build tooling. PGO is now fairly easy, but BOLT is still a pain to integrate for no clear reason, and PROPELLER has been left in an embarrassing state where even its own PDF link is broken [1]. (To be fair, it may not be Google's highest priority at this time.)
[1] https://github.com/google/llvm-propeller
-
LLVM Now Using PGO for Its x86_64 Windows Release Binaries: ~22% Faster Builds
Propeller does this: https://github.com/google/llvm-propeller/blob/main/Docs/Opti...
-
Speeding up the Rust compiler without changing its code
Looks like they deleted their branches. The latest commit I could find (via pr#11) seems to have the PDF. https://github.com/google/llvm-propeller/tree/424c3b885e60d8ff9446b16df39d84fbf6596aec
-
Performance variation when moving functions between files
Google also wrote PROPELLER which claims to be even better than BOLT, but I don't know why they didn't drive it all the way to upstream. BOLT got merged after PROPELLER claimed to obsolete it. I can only assume there was a lot more to the story than anyone is saying in public.
- AV1 related job offer :O
BOLT
-
Squeezing a Little More Performance Out of Bytecode Interpreters
Hi Stephen, congrats for the nice work! Have you guys considered using BOLT to optimize the interpreter? what it does is pretty much what has been suggested in this thread: profile + code reordering.
-
I didn't find any post-link binary optimizers for Windows executables. Why?
For Linux unstripped ELFs there is the BOLT project (BOLT/bolt at main ยท facebookincubator/BOLT (github.com)). For PE files I found nothing. I would like to know if there are any or why there are none.
- Why is Rosetta 2 fast?
-
The Rust compiler is now compiled with (thin) LTO (finally) for 5-10% improvements
Google automatically profiles everything running in their datacenters and compiles everything with LTO+PGO on by default. And beyond LTO, both Facebook's BOLT and Google's Propeller can perform additional binary optimizations on top of what regular LTO does.
-
Related work on profiling reuse across program versions.
As an example, BOLT (Meta's binary optimizer) uses two strategies to map profiling information from one program onto another. First, it can use the address of branch instructions as anchor points for profiling data. Branches that share the same address (offset from the beginning of the function) can reuse profiling information. Another approach is to use the hashcode formed by the opcodes of instructions in basic blocks as anchor points. As long as the basic block is not modified, BOLT can reuse its profiling data. This approach was described in the paper "Bmat-a binary matching tool for stale profile propagation". If profiling information cannot be mapped onto the new program, then it is said to be stale.
-
CFLAGS , LDFLAGS recommendation for making EMACS LIGHTENING FASTER?
If you're just excited to try out some shiny things then you can take a look at https://github.com/facebookincubator/BOLT, which is like PGO AFAIU. But again, in would definitely help if you have some elisp snippet that would measure the performance you care about so that you can see how much things improved after you enable a flag like -O3 or apply a tool like BOLT.
- Bolt - Optimize Linux Image - Has Anyone Tried at Home?
-
What would it take to get LLVM to align branch targets with memory pages (to double the spatial locality vs. if the targets straddle memory pages)?
The best tool we have for maximizing code locality is probably BOLT. They measure improvements in icache hit ratios.
- AI Benchmark - 11900 Intel Optimized Tensorflow Performance Test
-
AV1 related job offer :O
Imagine PGO, but taken up a notch: https://github.com/facebookincubator/BOLT
What are some alternatives?
coz - Coz: Causal Profiling
rust - Empowering everyone to build reliable and efficient software.
cargo-xtask
linux - Linux kernel source tree
cargo-pgo - Cargo subcommand for optimizing Rust binaries/libraries with PGO and BOLT.