LLM4Decompile
aici
LLM4Decompile | aici | |
---|---|---|
2 | 6 | |
2,491 | 1,743 | |
- | 6.8% | |
8.7 | 9.9 | |
23 days ago | 6 days ago | |
Python | Rust | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
LLM4Decompile
aici
-
HonoJS: Small, simple, and ultrafast web framework for the Edges
Have you looked at AICI by Microsoft yet?
https://github.com/microsoft/aici/
-
LLM4Decompile: Decompiling Binary Code with LLM
I have been planning to work on something like this. I think that eventually, someone will crack the "binary in -> good source code out of LLM" pipeline but we are probably a few years away from that still. I say a few years because I don't think there's a huge pile of money sitting at the end of this problem, but maybe I'm wrong.
A really good "stop-gap" approach would be to build a decompilation pipeline using Ghidra in headless mode and then combine the strict syntax correctness of a decompiler with the "intuition/system 1 skills" of an LLM. My inspiration for this setup comes from two recent advancements, both shared here on HN:
1. AlphaGeometry: The Decompiler and the LLM should complement each other, covering each other's weaknesses. https://deepmind.google/discover/blog/alphageometry-an-olymp...
2. AICI: We need a better way of "hacking" on top of these models, and being able to use something like AICI as the "glue" to coordinate the generation of C source. I don't really want the weights of my LLM to be used to generate syntactically correct C source, I want the LLM to think in terms of variable names, "snippet patterns" and architectural choices while other tools (Ghidra, LLVM) worry about the rest. https://github.com/microsoft/aici
Obviously this is all hand-wavey armchair commentary from a former grad student who just thinks this stuff is cool. Huge props to these researchers for diving into this. I know the authors already mentioned incorporating Ghidra into their future work, so I know they're on the right track.
-
Show HN: Prompts as (WASM) Programs
We believe Guidance can run on top of AICI (we're working on efficient Earley parser for that [0], together with local Guidance folks). AICI is generally lower level (though our sample controllers are at similar level to Guidance).
[0] https://github.com/microsoft/aici/blob/main/controllers/aici...
- AI Controller Interface (AICI)
What are some alternatives?
deepcompyle - Pretraining transformers to decompile Python bytecodes
transformers-CFG - 🤗 A specialized library for integrating context-free grammars (CFG) in EBNF with the Hugging Face Transformers
arroyo - Distributed stream processing engine in Rust
ghidra_tools - A collection of Ghidra scripts, including the GPT-3 powered code analyser and annotator, G-3PO.
flyde - 🌟 Open-source, visual programming for developers. Includes a VS Code extension, integrates with existing TypeScript code, browser and Node.js.
pingora - A library for building fast, reliable and evolvable network services.
Awesome-LLM-Productization - Awesome-LLM-Productization: a curated list of tools/tricks/news/regulations about AI and Large Language Model (LLM) productization
sglang - SGLang is a structured generation language designed for large language models (LLMs). It makes your interaction with models faster and more controllable.