Regex Engine Internals as a Library

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • Onboard AI - Learn any GitHub repo in 59 seconds
  • InfluxDB - Collect and Analyze Billions of Data Points in Real Time
  • SaaSHub - Software Alternatives and Reviews
  • rust-memchr

    Optimized string search routines for Rust.

    I actually have an M2 mac mini in the mail from Apple for exactly this purpose.

    My time horizon is very long. It takes me a long time to do things these days.

    It has never been true that I don't want to support it. Merely that it is difficult to verify and test. There is also the problem that the port from x86 to arm is not straight-forward, do to both my own ignorance and what I believe are important missing vector operations such as movemask.

    This is discussed a bit more here (including the bit about movemask): https://github.com/BurntSushi/memchr/issues/76

  • Regex101.com-offline-app

    use regex101.com offline

    It seems like the site doesn't really need a server so people have made offline versions.

    So should be able to burn something like [1] to a disk.

    [1]: https://github.com/ibaaj/Regex101.com-offline-app/pull/1/fil...

  • Onboard AI

    Learn any GitHub repo in 59 seconds. Onboard AI learns any GitHub repo in minutes and lets you chat with it to locate functionality, understand different parts, and generate new code. Use it for free at www.getonboard.dev.

  • regex

    An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

    https://www.cs.princeton.edu/courses/archive/fall19/cos226/l... and https://kean.blog/post/lets-build-regex are excellent introductions to implementing a (very) simplified regex engine: construct a nondetermistic finite state automaton for the regex, then perform a graph search on the resulting digraph; if the vertex corresponding to your end state is reachable, you have a match.

    I think this exercise is valuable for anyone writing regexes to not only understand that there's less magic than one might think, but also to visualize a bunch of balls bouncing along an NFA - that bug you inevitably hit in production due to catastrophic backtracking now takes on a physical meaning!

    Separately re: the OP, https://github.com/rust-lang/regex/issues/822 (and specifically BurntSushi's comment at the very end of the issue) adds really useful context to the paragraph in the OP about niche APIs: https://blog.burntsushi.net/regex-internals/#problem-request... - searching with multiple regexes simultaneously against a text is both incredibly complex and incredibly useful, and I can't wait to see what the community comes up with for this pattern!

  • ILSpy

    .NET Decompiler with support for PDB generation, ReadyToRun, Metadata (&more) - cross-platform!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts