Parser generators vs. handwritten parsers: surveying major languages in 2021

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • Scout APM - Less time debugging, more time building
  • OPS - Build and Run Open Source Unikernels
  • SonarLint - Deliver Cleaner and Safer Code - Right in Your IDE of Choice!
  • GitHub repo llvm-project

    The LLVM Project is a collection of modular and reusable compiler and toolchain technologies. Note: the repository does not accept github pull requests at this moment. Please submit your patches at http://reviews.llvm.org.

    It seems to me that the parsing code in clang is distributed over multiple files which together are way more than 3000 lines: https://github.com/llvm/llvm-project/tree/llvmorg-12.0.1/cla...

  • GitHub repo Lark

    Lark is a parsing toolkit for Python, built with a focus on ergonomics, performance and modularity.

    I know SPARK's docstring use influenced PLY.

    PLY doesn't use Earley, but "Earley" does come up in the show notes of an interview with Beazley, PLY's author, at https://www.pythonpodcast.com/episode-95-parsing-and-parsers... . No transcript, and I'm not going to listen to it just to figure out the context.

    https://github.com/lark-parser/lark "implements both Earley(SPPF) and LALR(1)".

    Kegler, the author of that timeline I linked to, is the author of Marpa. Home page is http://savage.net.au/Marpa.html . The most recent HN comments about it are from a year ago, at https://news.ycombinator.com/item?id=24321395 .

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • GitHub repo adama-lang

    A programming language for board games powered by the JVM. It is a data-centric programming language which enables building tiny persistent game servers which radically reduce engineering and operational costs for board games.

    When I switched from ANTLR to hand written for Adama ( http://www.adama-lang.org/ ), I felt way better about things. I was able to get sane error messages, and I could better annotate my syntax tree with comments and line/char numbers.

    A killer feature for a parser generator would be the ability to auto-generate a pretty printer which requires stuffing comments into the tree as a "meta token".

  • GitHub repo IParse

    IParse: an interpreting parser written in C++

    I implemented an unparse function in IParse, which is not a parser generator, but a parser that interprets a grammar. See for example https://github.com/FransFaase/IParse/blob/master/software/c_... where symbols starting with a back slash are a kind of white space terminals during the unparse. For example, \inc stands for incrementing the indentation where \dec decrements it. The \s is used to indicate that at given location a space should be included.

  • GitHub repo ruby

    The Ruby Programming Language [mirror]

    The Ruby yacc file is scary to look at. 13+ thousand lines in a single file.

    Would it be better with hand rolled and they could have abstracted and organized somethings or does it all make sense in its current format if you are familiar with it?

    https://github.com/ruby/ruby/blob/v3_0_2/parse.y

  • GitHub repo nearley

    📜🔜🌲 Simple, fast, powerful parser toolkit for JavaScript.

  • GitHub repo Fast Parse

    Writing Fast Parsers Fast in Scala

    Agreed! I would say that parser combinators are the sweet spot and the right choice in most cases.

    Scala has them as well, e.g.: https://com-lihaoyi.github.io/fastparse/

    And the good thing is, you don't have to learn a completely new language/syntax, you can use the host language's syntax and you have full IDE support as well.

  • OPS

    OPS - Build and Run Open Source Unikernels. Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.

  • GitHub repo dmd

    dmd D Programming Language compiler

    Just read the code for an existing one like:

    https://github.com/dlang/dmd/blob/master/src/dmd/cparse.d

    which is a C parser. It's not hard to follow.

  • GitHub repo Crate

    CrateDB is a distributed SQL database that makes it simple to store and analyze massive amounts of machine data in real-time.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts