regexp-Brzozowski
recross-coq
regexp-Brzozowski | recross-coq | |
---|---|---|
1 | 1 | |
12 | 0 | |
- | - | |
3.9 | 10.0 | |
7 months ago | over 1 year ago | |
Coq | Coq | |
MIT License | GNU General Public License v3.0 only |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
regexp-Brzozowski
-
Show HN: Regex Derivatives (Brzozowski Derivatives)
Thanks for sharing. I am not familiar with Agda. Will take a look. There is somewhat similar code in COQ: https://github.com/coq-community/regexp-Brzozowski
recross-coq
-
Show HN: Regex Derivatives (Brzozowski Derivatives)
I'm currently building a couple of regexp engines:
One, that's a formalization[0] in Coq with big-step semantics, which uncommonly has the intersection operator, and includes several equivalence relations and a proof of the pumping lemma, excepting one case (more on that below).
As a learning exercise and for historical reasons, I've also mostly ported Rust Cox's re1 engine to Rust[1], which includes VM matchers in the style of Henry Spencer, Ken Thompson, and Rob Pike. I also plan to port Doug McIlroy's engine[2], which is interesting for having intersection and complement and special handling for sublanguages, all the way down to just concatenation matched with Knuth-Morris-Pratt. I also want to examine the Rust (thanks burntsushi!), RE2, and Plan 9 engines in more depth.
Once I have time to get back to the project, I want to get back to my regular expression crossword puzzle solver. For that, I'm converting the hint regexps to DFAs, that match strings of some fixed length, and concatenating and intersecting them, until a single regexp is yielded, which should be a string literal, if the puzzle has a single solution. For backreferences, it's more tricky, but I plan on rewriting backreferences to the captured expression, where the lengths of both match, then either executing it with a stack like a pushdown automata or constructing a set of constraints on the characters by index.
As an aside: In my proof of the pumping lemma[3], I got stuck on the case for intersection and I'd love insight. Regular languages are closed under intersection, so the pumping lemma should hold for my implementation. I need to prove that if s =~ re1 and s =~ re2 can be pumped, then so can s =~ And re1 re2. My problem is that re1 and re2 split s into different substrings s = s11 ++ s12 ++ s13 = s21 ++ s22 ++ s23, then state that (forall n, s11 ++ repeat s12 n ++ s13 =~ re1) and (forall n, s21 ++ repeat s22 n ++ s23 =~ re2). My intuition is that s11 = s21, s12 = s22, and s13 = s23, because they both match for the intersection, but I'm not convinced of that and haven't been able to formulate a proof for that.
0: https://github.com/thaliaarchi/recross-coq
What are some alternatives?
analysis - Mathematical Components compliant Analysis Library
ocaml-re - Pure OCaml regular expressions, with support for Perl and POSIX-style strings
mcilroy-regex - Doug McIlroy's C++ regular expression matching library
math-comp - Mathematical Components
re1-rust - A port of re1, Russ Cox’s simple, virtual machine–based regular expression engine
brzozowski - Brzozowski derivative python sketch
agda-regexp-automata - Formalization of Regular Languages in Agda: regular expressions, finite-state automata, proof of equivalence, proof of the pumping lemma.
.NET Runtime - .NET is a cross-platform runtime for cloud, mobile, desktop, and IoT apps.
regex - An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.