Show HN: Regex Derivatives (Brzozowski Derivatives)

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

brzozowski

4 82 5.8 Python

Brzozowski derivative python sketch
regex

91 3,336 9.1 Rust

An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

IIRC, that's what the Rust regex engine[1] currently uses
[1] https://github.com/rust-lang/regex

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
redgrep

4 150 5.8 C++

♥ Janusz Brzozowski

I don't think Rust regex engine relies on this technique. I guess the main point is when you construct the DFA directly you still have the possibility of the exponential explosion of the number of states. That's why modern engines balance between NFA/DFA and lazy DFA.
Though there is an implementation that relies only on Brzozowski derivatives: https://github.com/google/redgrep

agda-regexp-automata

1 15 10.0 Agda

Formalization of Regular Languages in Agda: regular expressions, finite-state automata, proof of equivalence, proof of the pumping lemma.

And here the formalized proof in less than 150 lines of code (in Agda) for Brzozowski Derivatives for regex matching (and additional regular languages theorems): https://github.com/desi-ivanov/agda-regexp-automata

ocaml-re

2 225 5.8 OCaml

Pure OCaml regular expressions, with support for Perl and POSIX-style strings

Note that it's not difficult to (lazily or not) build a NFA using derivatives as well (with Antimirov's construction).
[1]: https://github.com/ocaml/ocaml-re/

regexp-Brzozowski

1 12 3.9 Coq

Coq formalization of decision procedures for regular expression equivalence [maintainer=@anton-trunov]

Thanks for sharing. I am not familiar with Agda. Will take a look. There is somewhat similar code in COQ: https://github.com/coq-community/regexp-Brzozowski

recross-coq

1 0 10.0 Coq

Regexp engine in Coq for solving regexp crosswords

I'm currently building a couple of regexp engines:
One, that's a formalization[0] in Coq with big-step semantics, which uncommonly has the intersection operator, and includes several equivalence relations and a proof of the pumping lemma, excepting one case (more on that below).
As a learning exercise and for historical reasons, I've also mostly ported Rust Cox's re1 engine to Rust[1], which includes VM matchers in the style of Henry Spencer, Ken Thompson, and Rob Pike. I also plan to port Doug McIlroy's engine[2], which is interesting for having intersection and complement and special handling for sublanguages, all the way down to just concatenation matched with Knuth-Morris-Pratt. I also want to examine the Rust (thanks burntsushi!), RE2, and Plan 9 engines in more depth.
Once I have time to get back to the project, I want to get back to my regular expression crossword puzzle solver. For that, I'm converting the hint regexps to DFAs, that match strings of some fixed length, and concatenating and intersecting them, until a single regexp is yielded, which should be a string literal, if the puzzle has a single solution. For backreferences, it's more tricky, but I plan on rewriting backreferences to the captured expression, where the lengths of both match, then either executing it with a stack like a pushdown automata or constructing a set of constraints on the characters by index.
As an aside: In my proof of the pumping lemma[3], I got stuck on the case for intersection and I'd love insight. Regular languages are closed under intersection, so the pumping lemma should hold for my implementation. I need to prove that if s =~ re1 and s =~ re2 can be pumped, then so can s =~ And re1 re2. My problem is that re1 and re2 split s into different substrings s = s11 ++ s12 ++ s13 = s21 ++ s22 ++ s23, then state that (forall n, s11 ++ repeat s12 n ++ s13 =~ re1) and (forall n, s21 ++ repeat s22 n ++ s23 =~ re2). My intuition is that s11 = s21, s12 = s22, and s13 = s23, because they both match for the intersection, but I'm not convinced of that and haven't been able to formulate a proof for that.
0: https://github.com/thaliaarchi/recross-coq

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
re1-rust

1 0 3.3 Rust

A port of re1, Russ Cox’s simple, virtual machine–based regular expression engine
mcilroy-regex

1 95 10.0 C++

Doug McIlroy's C++ regular expression matching library
.NET Runtime

606 14,047 10.0 C#

.NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.

[2]: https://github.com/dotnet/runtime/tree/main/src/libraries/Sy...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project