remill
revng-qa
Our great sponsors
remill | revng-qa | |
---|---|---|
3 | 1 | |
1,177 | 6 | |
2.6% | - | |
6.4 | 8.2 | |
16 days ago | 10 days ago | |
C++ | Python | |
Apache License 2.0 | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
remill
-
Revng translates (i386, x86-64, MIPS, ARM, AArch64, s390x) binaries to LLVM IR
Usually such things are called lifters. Wonder how this tool compares to other existing LLVM IR lifters, such as remill[0] and rellume[1].
0: https://github.com/lifting-bits/remill
- Decompiler Explorer
- fcd – LLVM-based native program optimizing decompiler
revng-qa
-
Revng translates (i386, x86-64, MIPS, ARM, AArch64, s390x) binaries to LLVM IR
> the binary code to LLVM IR uplifting loses a lot of context
Losing context is good in order to ensure you properly decoupled the frontend from the rest of the pipeline.
We don't even keep track of what a "call" instruction is, we re-detect it on the LLVM IR.
One reason you may want to preserve context is to let the user know where a specific piece of lifted code originated from. In order to preserve this information, we exploit LLVM's debugging metadata and it works pretty well. There's some loss there, but LLVM transformations strive to preserve it.
After all, imagine you have `add rax, 4; add rax, 4`, you'll want to optimize it to a +8 and you'll either have to decide if you want to associate your +8 operation with the first or the second instruction.
> the binary code to LLVM IR uplifting loses a lot of [...] semantics information
Not sure what you mean here, we use QEMU as a lifter and that's very accurate in terms of semantics.
I'm not sure what MIR and Swift IR have to do with the discussion, those are higher level IRs for specific languages. LLVM is rather low level and it's language agnostic.
However, for going beyond lifting, i.e., decompilation, it's true that LLVM shows some significant limitations. That's why we're rolling our own MLIR dialect, but we can still benefit of all the MLIR/LLVM infrastructure, optimizations and analyses. We're not starting from scratch.
> emulating pieces of the code sparsely to figure out indirect jumps and so on
It's hard to emulate without starting from the beginning. Maybe you're thinking about symbolic execution?
In any case, rev.ng does not emulate and does not do any symbolic execution: we have a data-flow analysis that detects destinations of indirect jumps and it's pretty scalable and effective. Example of things we handle: https://github.com/revng/revng-qa/blob/master/share/revng/te...
What are some alternatives?
llvm-tutor - A collection of out-of-tree LLVM passes for teaching and learning
revng - revng: the core repository of the rev.ng project
fcd - An optimizing decompiler
rellume - Lift machine code to performant LLVM IR
anvill - anvill forges beautiful LLVM bitcode out of raw machine code
rellic - Rellic produces goto-free C output from LLVM bitcode
asmjit - Low-latency machine code generation
mcsema - Framework for lifting x86, amd64, aarch64, sparc32, and sparc64 program binaries to LLVM bitcode