LLM4Decompile: Decompiling Binary Code with LLM

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

rizin

46 2,436 9.8 C

UNIX-like reverse engineering framework and command-line toolset.
LLM4Decompile

2 2,415 8.7 Python

Reverse Engineering: Decompiling Binary Code with Large Language Models
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
deepcompyle

1 0 6.1 Python

Pretraining transformers to decompile Python bytecodes

Hey, I am working on my own LLM-based decompiler for Python bytecode (https://github.com/kukas/deepcompyle). I feel there are not many people working on this research direction but I think it could be quite interesting, especially now that longer attention contexts are becoming feasible. If anyone knows a team that is working on this, I would be quite interested in cooperation.

ghidra_tools

6 281 5.4 Python

A collection of Ghidra scripts, including the GPT-3 powered code analyser and annotator, G-3PO.

relevant: https://news.ycombinator.com/item?id=34250872 (G-3PO: A protocol droid for Ghidra, or GPT-3 for reverse-engineering <https://github.com/tenable/ghidra_tools/blob/main/g3po/g3po....>; Jan, 2023; 44 comments)

aici

6 1,714 9.9 Rust

AICI: Prompts as (Wasm) Programs

I have been planning to work on something like this. I think that eventually, someone will crack the "binary in -> good source code out of LLM" pipeline but we are probably a few years away from that still. I say a few years because I don't think there's a huge pile of money sitting at the end of this problem, but maybe I'm wrong.
A really good "stop-gap" approach would be to build a decompilation pipeline using Ghidra in headless mode and then combine the strict syntax correctness of a decompiler with the "intuition/system 1 skills" of an LLM. My inspiration for this setup comes from two recent advancements, both shared here on HN:
1. AlphaGeometry: The Decompiler and the LLM should complement each other, covering each other's weaknesses. https://deepmind.google/discover/blog/alphageometry-an-olymp...
2. AICI: We need a better way of "hacking" on top of these models, and being able to use something like AICI as the "glue" to coordinate the generation of C source. I don't really want the weights of my LLM to be used to generate syntactically correct C source, I want the LLM to think in terms of variable names, "snippet patterns" and architectural choices while other tools (Ghidra, LLVM) worry about the rest. https://github.com/microsoft/aici
Obviously this is all hand-wavey armchair commentary from a former grad student who just thinks this stuff is cool. Huge props to these researchers for diving into this. I know the authors already mentioned incorporating Ghidra into their future work, so I know they're on the right track.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Why Does Windows Use Backslash as Path Separator?
4 projects | news.ycombinator.com | 24 Apr 2024
Refix: Fast, Debuggable, Reproducible Builds
4 projects | news.ycombinator.com | 2 Apr 2024
Dioxus 0.5: Web, Desktop, Mobile Apps in Rust
5 projects | news.ycombinator.com | 28 Mar 2024
Show HN: Kalosm an embeddable framework for pre-trained models in Rust
3 projects | news.ycombinator.com | 28 Feb 2024
Launch HN: AgentHub (YC W24) – A no-code automation platform
2 projects | news.ycombinator.com | 8 Feb 2024

LLM4Decompile: Decompiling Binary Code with LLM

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Reverse Engineering AI Decompile program-analysis Rust
Post date: 17 Mar 2024

rizin

LLM4Decompile

InfluxDB

deepcompyle

ghidra_tools

aici

WorkOS

Related posts

LLM4Decompile: Decompiling Binary Code with LLM

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Reverse Engineering AI Decompile program-analysis Rust Post date: 17 Mar 2024

rizin

LLM4Decompile

InfluxDB

deepcompyle

ghidra_tools

aici

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Reverse Engineering AI Decompile program-analysis Rust
Post date: 17 Mar 2024