Why I love Rust for tokenising and parsing

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • librdx

    Replicated Data eXchange format C lib

    https://github.com/gritzko/librdx/blob/master/JSON.lex

    In fact, that eBNF only produces the lexer. The parser part is not that impressive either, 120 LoC and quite repetitive https://github.com/gritzko/librdx/blob/master/JSON.c

    So, I believe, a parser infrastructure evolves till it only needs eBNF to make a parser. That is the saturation point.

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • chesskell

    Chess. In Haskell! (by ryandv)

    I don't know. Having written a small parser [0] for Forsyth-Edwards chess notation [1] Haskell takes the cake here in terms of simplicity and legibility; it reads almost as clearly as BNF, and there is very little technical ceremony involved, letting you focus on the actual grammar of whatever it is you are trying to parse.

    [0] https://github.com/ryandv/chesskell/blob/master/src/Chess/Fa...

    [1] https://en.wikipedia.org/wiki/Forsyth%E2%80%93Edwards_Notati...

  • grammars-v4

    Grammars written for ANTLR v4; expectation that the grammars are free of actions.

    Perhaps this ANTLR v4 sqlite grammar? [1]

    --

    1: https://github.com/antlr/grammars-v4/tree/master/sql/sqlite

  • CG-SQL-author

    CG-SQL Author's Cut: CG/SQL is a compiler that converts a SQL Stored Procedure like language into C for SQLite. SQLite has no stored procedures of its own. CG/CQL can also generate other useful artifacts for testing and schema maintenance.

    There is https://github.com/ricomariani/CG-SQL-author that goes way beyond and you'll need to create the Rust generation, you can play with it here with a Lua backend https://mingodad.github.io/CG-SQL-Lua-playground/ .

    Also I'm collecting several LALR(1) grammars here https://mingodad.github.io/parsertl-playground/playground/ that is an Yacc/Lex compatible online editor/interpreter that can generate EBNF for railroad diagram, SQL, C++ from the grammars, select "SQLite3 parser (partially working)" from "Examples" then click "Parse" to see the parse tree for the content in "Input source".

    I also created https://mingodad.github.io/plgh/json2ebnf.html to have a unified view of tree-sitter grammars and https://mingodad.github.io/lua-wasm-playground/ where there is an Lua script to generate an alternative EBNF to write tree-sitter grammars that can later be converted to the standard "grammar.js".

  • rust-playground

    The Rust Playground

  • pest

    The Elegant Parser (by pest-parser)

    I'll throw in a plug for https://pest.rs/ a PEG-based parser-generator library in Rust. Delightful to work with and removes so much of the boilerplate involved in a parser.

  • pathlib

    Go's own object-oriented path library (by chigopher)

    That's fair, the concurrency features are very handy though optional of course.

    The ecosystem and tooling are great, probably the best I've worked with. But the main reason I reach for Go is that it's got tiny mental overhead. There's a handful of language features so it becomes obvious what to use, so you can focus on the actual goal of the project.

    There are some warts of course. Heavy IO code can be riddled with err checks (actually, why I find it a bit awkward for servers). Similarly the stdlib is quite verbose when doing file system manipulation, I may try https://github.com/chigopher/pathlib because Python's pathlib is by far my favourite interface.

  • grafbase

    The GraphQL Federation platform

  • go

    The Go programming language

    I think they're asking how the code in the Go runtime that implements the garbage collector, a core feature of the language, avoids needing the garbage collector to already exist to be able to run, being written in the language that it's a core feature of. I suspect the answer is just something like "by very carefully not using language features that might tempt the compiler to emit something that requires an allocation". I think it's a fair question as it's not really obvious that that's possible--do you just avoid calling make() and new() and forming pointers to local variables that might escape? Do you need to run on a magical goroutine that won't try to grow its stack with gc-allocated segments? Can you still use slices, closures, ...?

    I think the relevant code is https://github.com/golang/go/blob/master/src/runtime/mgc.go and adjacent files. I see some annotations like //go:systemstack, //go:nosplit, //go:nowritebarrier that are probably relevant but I wouldn't know if there's any other specific requirements for that code.

  • salsa

    A generic framework for on-demand, incrementalized computation. Inspired by adapton, glimmer, and rustc's query system.

    I wrote my fairly share of parsers the last year, and the one I liked a lot is from Salsa examples, you can find it here[0].

    [0] https://github.com/salsa-rs/salsa/blob/e4d36daf2dc4a09600975...

  • bagel-rs

    Yup! You can find it here: https://github.com/brundonsmith/bagel-rs/blob/master/src/mod...

    [trying to remind myself how this works because it's been a while]

    So it's got macros for defining "union types", which combine a bunch of individual structs into an enum with same-name variants, and implement From and TryFrom to box/unbox the structs in their group's enum

    ASTInner is a struct that holds the Any (all possible AST nodes) enum in its `details` field, alongside some other info we want all AST nodes to have

    And then AST is a struct that holds (1) an RC, and (2) a PhantomData, where TKind is the (hierarchical) type of AST struct that it's known to contain

    AST can then be:

    1. Downcast to a TKind (basically just unboxing it)

    2. Upcast to an AST

    3. Recast to a different AST (changing the box's PhantomData type but not actually transforming the value). This uses trait implementations (implemented by the macros) to automatically know which parent types it can be "upwardly casted to", and which more-specific types it can try and be casted to

    The above three methods also have try_ versions

    What this means then is you can write functions against, eg, AST. You will have to pass an AST, but eg. an AST can be infallibly recast to an AST, but an AST can only try_recast to AST (returning an Option>)

    I worked in this codebase for a while and the dev experience was actually quite nice once I got all this set up. But figuring it out in the first place was a nightmare

    I'm wondering now if it would be possible/worthwhile to extract it into a crate

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Introducing CricLang 🏏: A programming language for cricket enthusiasts

    1 project | dev.to | 17 Mar 2024
  • For those of you who migrated to rust from go, do you feel more productive?

    4 projects | /r/rust | 22 Apr 2021
  • Build Golang from Source for v1.23+

    1 project | dev.to | 19 Nov 2024
  • Constraints in Go

    3 projects | news.ycombinator.com | 17 Nov 2024
  • Go Turns 15

    3 projects | news.ycombinator.com | 11 Nov 2024

Did you konow that Rust is
the 5th most popular programming language
based on number of metions?