myrex
lexer
myrex | lexer | |
---|---|---|
4 | 1 | |
4 | 0 | |
- | - | |
10.0 | - | |
over 1 year ago | over 4 years ago | |
Elixir | C | |
GNU General Public License v3.0 or later | - |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
myrex
-
Re2c
Concurrent parallel execution of NFA directly in Elixir:
https://github.com/mike-french/myrex
It is concurrent in both senses: a single match is split into many concurrent traversals of the network; multiple input strings can be matched concurrently within the same network; generators can also run concurrently in the network. It's possible because all state is in the traversal messages, not in the process nodes, and the whole thing runs asynch (non-blocking) in parallel, automatically using all cores in the machine.
> you see how regex syntax compiles down to various configurations of automata
That is Thompson's Construction [1]. The Myrex README contains a long description of how regex structures map to small process networks, and how they glue together. The final process network is a direct 1-1 representation of the NFA.
[1] Russ Cox has a nice explanation https://swtch.com/~rsc/regexp/regexp1.html
-
Calculate the difference and intersection of any two regexes
Another interesting question is: how many possible successful matches are there for a given input string. For example:
How many ways can (a?){m}(a){m} match the string* a{m}
i.e. input is m repetitions of the letter 'a'.
https://github.com/mike-french/myrex#ambiguous-example
-
Programming Techniques: Regular expression search algorithm (1968)
This is Thompson's Construction.
There is a nice description given by Russ Cox:
https://swtch.com/~rsc/regexp/regexp1.html
This project has an interesting implementation in Elixir, which converts the NFA directy into a process network:
https://github.com/mike-french/myrex
The network runs all possible traversals concurrently, and automatically scales to use all cores (Erlang BEAM runtime). Multiple input strings can also be processed concurrenty. It can also generate matching strings concurrently (Monte Carlo). It implements captures and Unicode character sets.
While it is designed for concurrency, it is not meant to be the fastest regex implementation. There is an example of a highly ambiguous match that launches 900k traversals and reports all capture results in about 10s.
lexer
-
Re2c
One fun project in this vein is to DIY something similar to this. To simplify things initially, you can use NFAs, along with an existing library to parse the regex syntax yourself.
The aha moment comes when you see how regex syntax compiles down to various configurations of automata. Couple that with the fact that automata are made to be composed together well, and the result is beautiful in a way that you rarely see in production code.
Here's my stab at it in Rust: https://github.com/mattgreen/lexer/tree/master/src/nfa
What are some alternatives?
cant - A programming argot
dictomaton - Finite state dictionaries in Java