Revng translates (i386, x86-64, MIPS, ARM, AArch64, s390x) binaries to LLVM IR

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

rizin

46 2,436 9.8 C

UNIX-like reverse engineering framework and command-line toolset.

Rizin[1] is also able to uplift native code to the new RzIL, which is based on the BAP Core Theory[2] and is essentially an extension of SMT theories of bitvectors, bitvector-indexed arrays of bitvectors and effects[3].
[1] https://rizin.re/
[2] https://binaryanalysisplatform.github.io/bap/api/master/bap-...
[3] https://github.com/rizinorg/rizin/blob/dev/doc/rzil.md

revng

7 1,188 9.7 C++

revng: the core repository of the rev.ng project
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
remill

3 1,177 6.4 C++

Library for lifting machine code to LLVM bitcode

Usually such things are called lifters. Wonder how this tool compares to other existing LLVM IR lifters, such as remill[0] and rellume[1].
0: https://github.com/lifting-bits/remill

rellume

1 318 8.7 C++

Lift machine code to performant LLVM IR
revng-qa

1 6 8.3 Python

Source for rev.ng test cases

> the binary code to LLVM IR uplifting loses a lot of context
Losing context is good in order to ensure you properly decoupled the frontend from the rest of the pipeline.
We don't even keep track of what a "call" instruction is, we re-detect it on the LLVM IR.
One reason you may want to preserve context is to let the user know where a specific piece of lifted code originated from. In order to preserve this information, we exploit LLVM's debugging metadata and it works pretty well. There's some loss there, but LLVM transformations strive to preserve it.
After all, imagine you have `add rax, 4; add rax, 4`, you'll want to optimize it to a +8 and you'll either have to decide if you want to associate your +8 operation with the first or the second instruction.
> the binary code to LLVM IR uplifting loses a lot of [...] semantics information
Not sure what you mean here, we use QEMU as a lifter and that's very accurate in terms of semantics.
I'm not sure what MIR and Swift IR have to do with the discussion, those are higher level IRs for specific languages. LLVM is rather low level and it's language agnostic.
However, for going beyond lifting, i.e., decompilation, it's true that LLVM shows some significant limitations. That's why we're rolling our own MLIR dialect, but we can still benefit of all the MLIR/LLVM infrastructure, optimizations and analyses. We're not starting from scratch.
> emulating pieces of the code sparsely to figure out indirect jumps and so on
It's hard to emulate without starting from the beginning. Maybe you're thinking about symbolic execution?
In any case, rev.ng does not emulate and does not do any symbolic execution: we have a data-flow analysis that detects destinations of indirect jumps and it's pretty scalable and effective. Example of things we handle: https://github.com/revng/revng-qa/blob/master/share/revng/te...

QEMU

190 9,277 10.0 C

Official QEMU mirror. Please see https://www.qemu.org/contribute/ for how to submit changes to QEMU. Pull Requests are ignored. Please only use release tarballs from the QEMU website.

> architectural registers are always updated
In tiny code, the guest registers (global TCG variables) are stored in the host's registers until you either call an helper which can access the CPU state or you return (`git grep la_global_sync`). This is the reason why QEMU is not so terribly slow.
But after a check, this also happens when you access the guest memory address space! https://github.com/qemu/qemu/blob/master/include/tcg/tcg-opc... (TCG_OPF_SIDE_EFFECTS is what matters)
But still, in the end, it's the same problem. What QEMU does, can be done in LLVM too. You could probably be more efficient in LLVM by using the exception handling mechanism (invoke and friends) to only serialize back to memory when there's an actual exception, at the cost of higher register pressure. More or less what we do here: https://rev.ng/downloads/bar-2019-paper.pdf

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Decompiler Explorer
4 projects | news.ycombinator.com | 12 Jul 2022
fcd – LLVM-based native program optimizing decompiler
4 projects | news.ycombinator.com | 1 May 2022
Is something similar to Rosetta 2 possible on Linux?
1 project | /r/linux | 29 Jan 2021
Run Windows on the browser with WASM power
1 project | news.ycombinator.com | 14 Mar 2024
Virtual Computer Museum – VNC into Archaic Windows Systems
1 project | news.ycombinator.com | 11 Jan 2024

Revng translates (i386, x86-64, MIPS, ARM, AArch64, s390x) binaries to LLVM IR

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Llvm llvm-ir Virtualization Reverse Engineering X86
Post date: 12 Jan 2024

rizin

revng

WorkOS

remill

rellume

revng-qa

QEMU

Related posts

Revng translates (i386, x86-64, MIPS, ARM, AArch64, s390x) binaries to LLVM IR

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Llvm llvm-ir Virtualization Reverse Engineering X86 Post date: 12 Jan 2024

rizin

revng

WorkOS

remill

rellume

revng-qa

QEMU

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Llvm llvm-ir Virtualization Reverse Engineering X86
Post date: 12 Jan 2024