A complete compiler and VM in 150 lines of code

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • parson

    Yet another PEG parser combinator library and DSL (by darius)

  • That's powerful enough to conveniently write, for example, a numerical root finding program for an arbitrary arithmetic expression.

    But I think that within a complexity budget of 150 lines of code you can maybe be even more ambitious than that.

    The example compiler in https://github.com/darius/parson/blob/master/eg_calc_compile... is a bit more stripped down than that, but in its 32 lines of code it compiles arithmetic assignment statements to a three-address RISC-like code (though using an unbounded number of registers). https://github.com/darius/parson/blob/master/eg_calc_to_rpn.... is a 16-line version that compiles the same language to a stack machine like your tutorial example.

    In 66 lines of code in https://github.com/kragen/peg-bootstrap/blob/master/peg.md I wrote an example compiler which compiles a PEG grammar into a JavaScript parser for that grammar. Admittedly those 66 lines do not include an implementation of JavaScript to run the code on. It compiles the language it's written in.

    In 132 lines of code in https://github.com/kragen/stoneknifeforth/blob/master/tinybo... I wrote an example compiler which compiles a crippled Forth dialect into i386 machine code, including an ELF header so you can run the result. It also compiles the language it's written in. It also doesn't include an i386 emulator to run it on.

    In 83 lines of code in http://canonical.org/~kragen/sw/dev3/neelcompiler.ml Neel Krishnaswami wrote a compiler from the untyped λ-calculus to a simple assembly language for a register machine. It also doesn't include an implementation of the assembly language.

    In 18 lines of code in http://canonical.org/~kragen/sw/dev3/meta5ix.m5, a simplification of META-II, I wrote a compiler from grammar descriptions to an assembly code for a parsing-oriented virtual machine. It compiles the language it's written in. A Python interpreter for the machine is in http://canonical.org/~kragen/sw/dev3/meta5ixrun.py (109 lines of code) and a precompiled version of the compiler-compiler for bootstrapping is in http://canonical.org/~kragen/sw/dev3/meta5ix.generated.m5asm.

    A slightly incompatible variant of Meta5ix which instead compiles itself to C is in http://canonical.org/~kragen/sw/dev3/meta5ix2c.m5 (133 lines of code, depending on how you count). (No C compiler is included.) The precompiled C output for bootstrapping is in http://canonical.org/~kragen/sw/dev3/meta5ix2c.c.

    Meta5ix is extremely weak and limited, really only enough for a compiler front-end; it can't, for example, do the kinds of RPN tricks we're talking about above.

  • peg-bootstrap

    A PEG that compiles itself.

  • That's powerful enough to conveniently write, for example, a numerical root finding program for an arbitrary arithmetic expression.

    But I think that within a complexity budget of 150 lines of code you can maybe be even more ambitious than that.

    The example compiler in https://github.com/darius/parson/blob/master/eg_calc_compile... is a bit more stripped down than that, but in its 32 lines of code it compiles arithmetic assignment statements to a three-address RISC-like code (though using an unbounded number of registers). https://github.com/darius/parson/blob/master/eg_calc_to_rpn.... is a 16-line version that compiles the same language to a stack machine like your tutorial example.

    In 66 lines of code in https://github.com/kragen/peg-bootstrap/blob/master/peg.md I wrote an example compiler which compiles a PEG grammar into a JavaScript parser for that grammar. Admittedly those 66 lines do not include an implementation of JavaScript to run the code on. It compiles the language it's written in.

    In 132 lines of code in https://github.com/kragen/stoneknifeforth/blob/master/tinybo... I wrote an example compiler which compiles a crippled Forth dialect into i386 machine code, including an ELF header so you can run the result. It also compiles the language it's written in. It also doesn't include an i386 emulator to run it on.

    In 83 lines of code in http://canonical.org/~kragen/sw/dev3/neelcompiler.ml Neel Krishnaswami wrote a compiler from the untyped λ-calculus to a simple assembly language for a register machine. It also doesn't include an implementation of the assembly language.

    In 18 lines of code in http://canonical.org/~kragen/sw/dev3/meta5ix.m5, a simplification of META-II, I wrote a compiler from grammar descriptions to an assembly code for a parsing-oriented virtual machine. It compiles the language it's written in. A Python interpreter for the machine is in http://canonical.org/~kragen/sw/dev3/meta5ixrun.py (109 lines of code) and a precompiled version of the compiler-compiler for bootstrapping is in http://canonical.org/~kragen/sw/dev3/meta5ix.generated.m5asm.

    A slightly incompatible variant of Meta5ix which instead compiles itself to C is in http://canonical.org/~kragen/sw/dev3/meta5ix2c.m5 (133 lines of code, depending on how you count). (No C compiler is included.) The precompiled C output for bootstrapping is in http://canonical.org/~kragen/sw/dev3/meta5ix2c.c.

    Meta5ix is extremely weak and limited, really only enough for a compiler front-end; it can't, for example, do the kinds of RPN tricks we're talking about above.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • stoneknifeforth

    a tiny self-hosted Forth implementation

  • That's powerful enough to conveniently write, for example, a numerical root finding program for an arbitrary arithmetic expression.

    But I think that within a complexity budget of 150 lines of code you can maybe be even more ambitious than that.

    The example compiler in https://github.com/darius/parson/blob/master/eg_calc_compile... is a bit more stripped down than that, but in its 32 lines of code it compiles arithmetic assignment statements to a three-address RISC-like code (though using an unbounded number of registers). https://github.com/darius/parson/blob/master/eg_calc_to_rpn.... is a 16-line version that compiles the same language to a stack machine like your tutorial example.

    In 66 lines of code in https://github.com/kragen/peg-bootstrap/blob/master/peg.md I wrote an example compiler which compiles a PEG grammar into a JavaScript parser for that grammar. Admittedly those 66 lines do not include an implementation of JavaScript to run the code on. It compiles the language it's written in.

    In 132 lines of code in https://github.com/kragen/stoneknifeforth/blob/master/tinybo... I wrote an example compiler which compiles a crippled Forth dialect into i386 machine code, including an ELF header so you can run the result. It also compiles the language it's written in. It also doesn't include an i386 emulator to run it on.

    In 83 lines of code in http://canonical.org/~kragen/sw/dev3/neelcompiler.ml Neel Krishnaswami wrote a compiler from the untyped λ-calculus to a simple assembly language for a register machine. It also doesn't include an implementation of the assembly language.

    In 18 lines of code in http://canonical.org/~kragen/sw/dev3/meta5ix.m5, a simplification of META-II, I wrote a compiler from grammar descriptions to an assembly code for a parsing-oriented virtual machine. It compiles the language it's written in. A Python interpreter for the machine is in http://canonical.org/~kragen/sw/dev3/meta5ixrun.py (109 lines of code) and a precompiled version of the compiler-compiler for bootstrapping is in http://canonical.org/~kragen/sw/dev3/meta5ix.generated.m5asm.

    A slightly incompatible variant of Meta5ix which instead compiles itself to C is in http://canonical.org/~kragen/sw/dev3/meta5ix2c.m5 (133 lines of code, depending on how you count). (No C compiler is included.) The precompiled C output for bootstrapping is in http://canonical.org/~kragen/sw/dev3/meta5ix2c.c.

    Meta5ix is extremely weak and limited, really only enough for a compiler front-end; it can't, for example, do the kinds of RPN tricks we're talking about above.

  • proofs

    My personal repository of formally verified mathematics.

  • For anyone who wants to learn Coq, I've just finished writing a tutorial [1] that is aimed at programmers (rather than, say, computer scientists). It covers topics like programming with dependent types, writing proofs as data, universes & other type theory stuff, and extracting verified code—with exercises. I hope people find it useful, and any feedback would be appreciated!

    [1] https://github.com/stepchowfun/proofs/tree/main/proofs/Tutor...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • A Taste of Coq and Correct Code by Construction

    3 projects | news.ycombinator.com | 3 Sep 2023
  • Formally Verifying Rust's Opaque Types

    2 projects | news.ycombinator.com | 1 Aug 2022
  • Translation of the Rust's core and alloc crates to Coq for formal verification

    3 projects | news.ycombinator.com | 15 May 2024
  • Kami: A Platform for Hardware Specification and Verification

    1 project | news.ycombinator.com | 28 Dec 2023
  • Change of Name: Coq –> The Rocq Prover

    3 projects | news.ycombinator.com | 26 Dec 2023