Re2c

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

oil

234 2,720 9.9 Python

Oils is our upgrade path from bash to a better language and runtime. It's also for Python and JavaScript users who avoid shell!

This is sort of a category error...
re2c is a lexer generator, and YAML and Python are recursive/nested formats.
You can definitely use re2c to lex them, but it's not the whole solution.
I use it for everything possible in https://www.oilshell.org, and it's amazing. It really reduces the amount of fiddly C code you need to parse languages, and it drops in anywhere.

lexer

1 0 - C

A batteries-included lexing library based on finite automata (by mattgreen)

One fun project in this vein is to DIY something similar to this. To simplify things initially, you can use NFAs, along with an existing library to parse the regex syntax yourself.
The aha moment comes when you see how regex syntax compiles down to various configurations of automata. Couple that with the fact that automata are made to be composed together well, and the result is beautiful in a way that you rarely see in production code.
Here's my stab at it in Rust: https://github.com/mattgreen/lexer/tree/master/src/nfa

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
rebar

22 197 8.5 Python

A biased barometer for gauging the relative speed of some regex engines on a curated set of tasks.

They are extremely fast too: https://github.com/BurntSushi/rebar?tab=readme-ov-file#summa...

myrex

4 4 10.0 Elixir

Match regular expressions using NFA process networks (Elixir)

Concurrent parallel execution of NFA directly in Elixir:
https://github.com/mike-french/myrex
It is concurrent in both senses: a single match is split into many concurrent traversals of the network; multiple input strings can be matched concurrently within the same network; generators can also run concurrently in the network. It's possible because all state is in the traversal messages, not in the process nodes, and the whole thing runs asynch (non-blocking) in parallel, automatically using all cores in the machine.
> you see how regex syntax compiles down to various configurations of automata
That is Thompson's Construction [1]. The Myrex README contains a long description of how regex structures map to small process networks, and how they glue together. The final process network is a direct 1-1 representation of the NFA.
[1] Russ Cox has a nice explanation https://swtch.com/~rsc/regexp/regexp1.html

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Crack the Code Like Caesar: Build Your Own Secret Message Encrypter! [Python code included]
1 project | dev.to | 28 Apr 2024
VMware Outsourcing Their Support
1 project | news.ycombinator.com | 28 Apr 2024
Show HN: Cognita – open-source RAG framework for modular applications
3 projects | news.ycombinator.com | 27 Apr 2024
Show HN: PgQueuer – Over 5k Jobs/SEC with PostgreSQL
1 project | news.ycombinator.com | 28 Apr 2024
Show HN: Code Limit – Your Refactoring Alarm
1 project | news.ycombinator.com | 28 Apr 2024

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Post date: 22 Feb 2024

oil

lexer

WorkOS

rebar

myrex

Related posts