Rob Pike's simple C regex matcher in Go

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • tiny-matcher

    Dynamic (no compilation), bounded recursive.

  • tiny-regex-c

    Small portable regex in C (cbmc verified, and extended) (by rurban)

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • regex

    Just a simple limited regex parser / matcher - written for practice instead of utility (by sjpotter)

    So fun story (to me in retrospect).

    I had an interview at google years ago where I basically I was asked to come up with this on my own. In my opinion, a terrible interview Q (per Kernighan's blog, linked in original post, it took pike an hour or two alone in his office), and I bombed it. It was either my just before or just after lunch interview and it stuck with me for the rest of the day and I couldn't give my entire focus to the rest of the sessions.

    anyways, after it was done, I found kernighan's blog post and decided I should try to implement this in java as it will allow me to even get some practice with things like junit and more experience with object oriented programming as I had been living in the C world at that time). so I did, but I then found this regular expression page ( https://www.regular-expressions.info/) and learned many more things about "regular expressions" that i hadn't learned in school (because they aren't regular) and wondered if I could extend pike's simplicity to them in a relatively simple manner. So I did.

    which I created this https://github.com/sjpotter/regex.

    Not meant to be "performant" but meant to be educational to me and perhaps others.

    Then when I was learning go, I rewrote it in Go. I did things that are not "normal" go patterns (i.e. using panic and recover to make error handling cleaner (in my opinion, if an error path is simply if (err) { return err } all the way up, I personally think panics/recover at the top can be cleaner, but it seems most disagree), but it was educational on how to rewrite my java code to an object oriented style in Go and to explore the limitations of interfaces and how one might get around them (also perhaps in ways that go against what might be considered proper go programming)

    https://github.com/sjpotter/regex-go

    though now that I look at that repo, wondering if all the work was done elsewhere and then I just pushed a single commit there, might need to go back and look at it

  • regex-go

    a port of my java regex play library to go, as an exploration of object oriented programming in go

    So fun story (to me in retrospect).

    I had an interview at google years ago where I basically I was asked to come up with this on my own. In my opinion, a terrible interview Q (per Kernighan's blog, linked in original post, it took pike an hour or two alone in his office), and I bombed it. It was either my just before or just after lunch interview and it stuck with me for the rest of the day and I couldn't give my entire focus to the rest of the sessions.

    anyways, after it was done, I found kernighan's blog post and decided I should try to implement this in java as it will allow me to even get some practice with things like junit and more experience with object oriented programming as I had been living in the C world at that time). so I did, but I then found this regular expression page ( https://www.regular-expressions.info/) and learned many more things about "regular expressions" that i hadn't learned in school (because they aren't regular) and wondered if I could extend pike's simplicity to them in a relatively simple manner. So I did.

    which I created this https://github.com/sjpotter/regex.

    Not meant to be "performant" but meant to be educational to me and perhaps others.

    Then when I was learning go, I rewrote it in Go. I did things that are not "normal" go patterns (i.e. using panic and recover to make error handling cleaner (in my opinion, if an error path is simply if (err) { return err } all the way up, I personally think panics/recover at the top can be cleaner, but it seems most disagree), but it was educational on how to rewrite my java code to an object oriented style in Go and to explore the limitations of interfaces and how one might get around them (also perhaps in ways that go against what might be considered proper go programming)

    https://github.com/sjpotter/regex-go

    though now that I look at that repo, wondering if all the work was done elsewhere and then I just pushed a single commit there, might need to go back and look at it

  • tlex

    A debugabble, flexible and customizable lexical analyzer (also powers Galore and Notations).

    Hah this is a classic. I actually used this as a reference to build my own about a year ago in typescript (documentation a long way to go and still WIP) which I am using in a couple of production ready dsls:

    https://github.com/panyam/tlex

  • Redis

    Redis is an in-memory database that persists on disk. The data model is key-value, but many different kind of values are supported: Strings, Lists, Sets, Sorted Sets, Hashes, Streams, HyperLogLogs, Bitmaps.

    Apparently I reinvented the same code several years after Pike:

    https://github.com/redis/redis/blob/99ebbee2b260b3466672b4fa...

  • ripgrep

    ripgrep recursively searches directories for a regex pattern while respecting your gitignore

    This is actually how glob matching is currently implemented in ripgrep: https://github.com/BurntSushi/ripgrep/blob/a9d97a1dda0a07a8e...

    Works quite well because it can take advantage of the host of optimizations inside the regex engine.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

  • gop

    The Go+ programming language is designed for engineering, STEM education, and data science

    > That said, I hope someday Go adds the "?" return-operator

    Same here. I think this is my biggest code-reading pain point as a go developer. I'm toying with the idea of playing more with Go+

    https://github.com/goplus/gop/blob/main/doc/docs.md#error-ha...

  • RE2

    RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.

    See also: https://github.com/google/re2/blob/main/re2/bitstate.cc

    Although its space complexity is P*T, which limits its use to small regexes/haystacks. It sounds like your claim is that your technique achieves a smaller space bound?

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts