30 Years of Decompilation and the Unsolved Structuring Problem: Part 1

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • PyCParser

    C parser and interpreter written in Python with automatic ctypes interface generation (by albertz)

  • A funny anecdote: Some time ago, I was writing a C-to-Python translator.

    (Why? Just for fun, https://github.com/albertz/PyCParser, even more just-for-fun goal was this: https://github.com/albertz/PyCPython).

    It literally would translate the C code in equivalent Python code, using ctypes heavily. It was mostly straight-forward, except for mapping goto (thus related to this control flow structuring problem).

    Of course, there are some hacks to introduce goto in Python, which in many cases would operate on the Python bytecode, which actually has the JUMP_ABSOLUTE op, but there are also other ways (https://stackoverflow.com/questions/6959360/goto-in-python).

    I could also have translated C directly to equivalent Python bytecode and not Python source code, but I really wanted to have Python source code.

    My ugly solution worked basically like this: Whenever there was some goto in a function, it would translate it as follows:

    First, we flatten any Python AST into a series of statements,

  • PyCPython

    interpret CPython in pure Python

  • A funny anecdote: Some time ago, I was writing a C-to-Python translator.

    (Why? Just for fun, https://github.com/albertz/PyCParser, even more just-for-fun goal was this: https://github.com/albertz/PyCPython).

    It literally would translate the C code in equivalent Python code, using ctypes heavily. It was mostly straight-forward, except for mapping goto (thus related to this control flow structuring problem).

    Of course, there are some hacks to introduce goto in Python, which in many cases would operate on the Python bytecode, which actually has the JUMP_ABSOLUTE op, but there are also other ways (https://stackoverflow.com/questions/6959360/goto-in-python).

    I could also have translated C directly to equivalent Python bytecode and not Python source code, but I really wanted to have Python source code.

    My ugly solution worked basically like this: Whenever there was some goto in a function, it would translate it as follows:

    First, we flatten any Python AST into a series of statements,

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • angr

    A powerful and user-friendly binary analysis platform!

  • That's awesome! That's exactly how modern decompilers deal with a special type of goto occurrence. They reduce gotos (or completely eliminate them) by introducing a `while(true)` loop, followed by corresponding `continue` and `breaks`... we all, of course, know that `while(true)` did not exist in the source, but it's a nice hack!

    We even do this in the angr decompiler, found here: https://github.com/angr/angr/blob/8e48d001e18a913ecd4ed2e995...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts