Regex Engine Internals as a Library

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • rust-memchr

    Optimized string search routines for Rust.

  • I actually have an M2 mac mini in the mail from Apple for exactly this purpose.

    My time horizon is very long. It takes me a long time to do things these days.

    It has never been true that I don't want to support it. Merely that it is difficult to verify and test. There is also the problem that the port from x86 to arm is not straight-forward, do to both my own ignorance and what I believe are important missing vector operations such as movemask.

    This is discussed a bit more here (including the bit about movemask): https://github.com/BurntSushi/memchr/issues/76

  • Regex101.com-offline-app

    use regex101.com offline

  • It seems like the site doesn't really need a server so people have made offline versions.

    So should be able to burn something like [1] to a disk.

    [1]: https://github.com/ibaaj/Regex101.com-offline-app/pull/1/fil...

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • regex

    An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

  • https://www.cs.princeton.edu/courses/archive/fall19/cos226/l... and https://kean.blog/post/lets-build-regex are excellent introductions to implementing a (very) simplified regex engine: construct a nondetermistic finite state automaton for the regex, then perform a graph search on the resulting digraph; if the vertex corresponding to your end state is reachable, you have a match.

    I think this exercise is valuable for anyone writing regexes to not only understand that there's less magic than one might think, but also to visualize a bunch of balls bouncing along an NFA - that bug you inevitably hit in production due to catastrophic backtracking now takes on a physical meaning!

    Separately re: the OP, https://github.com/rust-lang/regex/issues/822 (and specifically BurntSushi's comment at the very end of the issue) adds really useful context to the paragraph in the OP about niche APIs: https://blog.burntsushi.net/regex-internals/#problem-request... - searching with multiple regexes simultaneously against a text is both incredibly complex and incredibly useful, and I can't wait to see what the community comes up with for this pattern!

  • ILSpy

    .NET Decompiler with support for PDB generation, ReadyToRun, Metadata (&more) - cross-platform!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts