recross-coq
redgrep
recross-coq | redgrep | |
---|---|---|
1 | 4 | |
0 | 150 | |
- | 0.7% | |
10.0 | 5.8 | |
over 1 year ago | 3 months ago | |
Coq | C++ | |
GNU General Public License v3.0 only | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
recross-coq
-
Show HN: Regex Derivatives (Brzozowski Derivatives)
I'm currently building a couple of regexp engines:
One, that's a formalization[0] in Coq with big-step semantics, which uncommonly has the intersection operator, and includes several equivalence relations and a proof of the pumping lemma, excepting one case (more on that below).
As a learning exercise and for historical reasons, I've also mostly ported Rust Cox's re1 engine to Rust[1], which includes VM matchers in the style of Henry Spencer, Ken Thompson, and Rob Pike. I also plan to port Doug McIlroy's engine[2], which is interesting for having intersection and complement and special handling for sublanguages, all the way down to just concatenation matched with Knuth-Morris-Pratt. I also want to examine the Rust (thanks burntsushi!), RE2, and Plan 9 engines in more depth.
Once I have time to get back to the project, I want to get back to my regular expression crossword puzzle solver. For that, I'm converting the hint regexps to DFAs, that match strings of some fixed length, and concatenating and intersecting them, until a single regexp is yielded, which should be a string literal, if the puzzle has a single solution. For backreferences, it's more tricky, but I plan on rewriting backreferences to the captured expression, where the lengths of both match, then either executing it with a stack like a pushdown automata or constructing a set of constraints on the characters by index.
As an aside: In my proof of the pumping lemma[3], I got stuck on the case for intersection and I'd love insight. Regular languages are closed under intersection, so the pumping lemma should hold for my implementation. I need to prove that if s =~ re1 and s =~ re2 can be pumped, then so can s =~ And re1 re2. My problem is that re1 and re2 split s into different substrings s = s11 ++ s12 ++ s13 = s21 ++ s22 ++ s23, then state that (forall n, s11 ++ repeat s12 n ++ s13 =~ re1) and (forall n, s21 ++ repeat s22 n ++ s23 =~ re2). My intuition is that s11 = s21, s12 = s22, and s13 = s23, because they both match for the intersection, but I'm not convinced of that and haven't been able to formulate a proof for that.
0: https://github.com/thaliaarchi/recross-coq
redgrep
-
Show HN: Regex Derivatives (Brzozowski Derivatives)
I don't think Rust regex engine relies on this technique. I guess the main point is when you construct the DFA directly you still have the possibility of the exponential explosion of the number of states. That's why modern engines balance between NFA/DFA and lazy DFA.
Though there is an implementation that relies only on Brzozowski derivatives: https://github.com/google/redgrep
-
Introducing: Pomsky (formerly Rulex)
redgrep did it though: https://github.com/google/redgrep
- Redgrep – grep based on regex derivatives, matches in linear time
-
Speed of Rust vs. C
It couldn't figure it out from looking through ripgrep's website: does ripgrep support intersection and complement of expressions? Like eg https://github.com/google/redgrep does.
Regular languages are closed under those operations after all.
What are some alternatives?
ocaml-re - Pure OCaml regular expressions, with support for Perl and POSIX-style strings
ixy-languages - A high-speed network driver written in C, Rust, C++, Go, C#, Java, OCaml, Haskell, Swift, Javascript, and Python
mcilroy-regex - Doug McIlroy's C++ regular expression matching library
smartstring - Compact inlined strings for Rust.
re1-rust - A port of re1, Russ Cox’s simple, virtual machine–based regular expression engine
libskry_r - Lucky imaging library
brzozowski - Brzozowski derivative python sketch
barre - A Regular Expression Library and CFG parser for Rust using Brzozski Derivatives
regexp-Brzozowski - Coq formalization of decision procedures for regular expression equivalence [maintainer=@anton-trunov]
fst - Represent large sets and maps compactly with finite state transducers.
agda-regexp-automata - Formalization of Regular Languages in Agda: regular expressions, finite-state automata, proof of equivalence, proof of the pumping lemma.
ixy - A simple yet fast user space network driver for Intel 10 Gbit/s NICs written from scratch