Calculate the difference and intersection of any two regexes

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • myrex

    Match regular expressions using NFA process networks (Elixir)

  • Another interesting question is: how many possible successful matches are there for a given input string. For example:

    How many ways can (a?){m}(a){m} match the string* a{m}

    i.e. input is m repetitions of the letter 'a'.

    https://github.com/mike-french/myrex#ambiguous-example

  • dictomaton

    Finite state dictionaries in Java

  • Say you want to compute all strings of length 5 that the automaton can generate. Conceptually the nicest way is to create an automaton that matches any five characters and then compute the intersection between that automaton and the regex automaton. Then you can generate all the strings in the intersection automaton. Of course, IRL, you wouldn't actually generate the intersection (you can easily do this on the fly), but you get the idea.

    Automata are really a lost art in modern natural language processing. We used to do things like store a large vocabulary in an deterministic acyclic minimized automaton (nice and compact, so-called dictionary automaton). And then to find, say all words within Levenshtein distance 2 of hacker, create a Levenshtein automaton for hacker and then compute (on the fly) the intersection between the Levenshtein automaton and the dictionary automaton. The language of the automaton is then all words within the intersection automaton.

    I wrote a Java package a decade ago that implements some of this stuff:

    https://github.com/danieldk/dictomaton

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • cant

    A programming argot

  • That was one of the short examples in Norvig's Python program-design course for Udacity. https://github.com/darius/cant/blob/master/library/regex-gen... (I don't have the Python handy.)

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts