Our great sponsors
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
melody
Melody is a language that compiles to regular expressions and aims to be more readable and maintainable
-
hgrep-smallcore
University project: Haskell implementation of https://www.ccs.neu.edu/home/turon/re-deriv.pdf, with a very small internal regex representation.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
swift-evolution
This maintains proposals for changes and user-visible enhancements to the Swift Programming Language.
-
RE2
RE2 is a fast, safe, thread-friendly alternative to backtracking regular expression engines like those used in PCRE, Perl, and Python. It is a C++ library.
-
RegExr
RegExr is a HTML/JS based tool for creating, testing, and learning about Regular Expressions.
-
oil
Oils is our upgrade path from bash to a better language and runtime. It's also for Python and JavaScript users who avoid shell!
-
regex
An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
- https://rulex-rs.github.io/ - Very similar to legacy regex syntax, supports macros and number ranges, supports unicode, _amazing_ error messages help convert legacy to new syntax, backslash escapes only for quotes. Rust compiler, as of today no built in way to use outside rust (but they seem to be planning it).
('What is your ' ('name'|'quest'|'favorite colour')'?' [s]){1,3}
I've collected the different projects along with a nontrivial syntax example here: https://github.com/SonOfLilit/kleenexp#similar-works
- Regular Expressions - very popular, occasionally reads like line noise, backslash for escape
[1-3 'What is your ' ['name' | 'quest' | 'favourite colour'] '?' [0-1 #space]]
- https://github.com/yoav-lavi/melody - More verbose, supports macros, backslash escapes only for quotes. Rust compiler, babel plugin. Improves with time, getting quite impressive.
Yes and straighforwardly so if you use character classes as your basic building blocks. Here I implemented a Haskell implementation that is easily extandable to include complements: https://github.com/dan-blank/hgrep-smallcore (I like this project because it translates ERE compliant regexes down to only 4 constructs, one of which being character classes). It implements https://www.ccs.neu.edu/home/turon/re-deriv.pdf, character classes are described in 4.2.
I actually had complement in it as a 5th construct, but when the submission came closer and the examiners found some errors in my logic (my fault for not writing good enough unit tests!), I took complement out again.
Also notable is Remake, which has Rust bindings: https://github.com/ethanpailes/remake
Interesting. It's very similar to a regex language I created for byte-oriented regular expressions [0]
Similar usability principles: delimitated strings, ignore whitespace, and comments.
[0] https://github.com/nishihatapalmer/byteseek/blob/master/synt...
For simple regexes, Swift has short literals, and (AFAIK) you can mix and match the DSL and the short literals. https://github.com/apple/swift-evolution/blob/main/proposals... gives this example:
// A regex for extracting a currency (dollars or pounds) and amount from input
On a related note, if you have Python regex code that you want to make more stable/performant, https://pypi.org/project/pyre2/ is a drop-in replacement for `re` that (configurably) falls back to `re` if you use lookaheads, etc.
The design philosophy behind RE2 for those unfamiliar with the library: https://github.com/google/re2/wiki/WhyRE2
> Are you honestly implying that there are still people who, in all seriousness, use a REGEX to parse HTML
Subsets of it? Yes. See Google's lit-html as an example: https://github.com/lit/lit/blob/main/packages/lit-html/src/l...
RegExr (https://regexr.com/) doesn't come up enough in these discussions. One of the nicest regex debugging/development tools on the internet today.
I added this to the Alternative Regex Syntax wiki page with about a dozen simlar projects:
https://github.com/oilshell/oil/wiki/Alternative-Regex-Synta...
e.g. compare with Melody 3 months ago: https://news.ycombinator.com/item?id=30358554
and Oil's Eggex:
https://www.oilshell.org/release/latest/doc/eggex.html
From a quick glance Rulex looks very similar to Eggex!
A difference is that Eggex is embedded in a shell so you can use normal assignment statements to build up subpatterns. And you can also interpolate directly into an 'egrep' or 'awk' command.
After looking at all the examples I can't say I'm a fan. Sometimes it's even more verbose than standard regular expressions. Over the years I've become quite familiar with regexp so maybe I'm just biased, but I'd rather have something like CoffeeScript's block expressions instead, where you can easily group and document each part:
We were talking about EREs, which are an artifact of POSIX, not UTS#18. So the relevant standard for this specific conversion is POSIX.
To redirect to UTS#18, I don't think UTS#18 subsumes POSIX. UTS#18 doesn't support [[=a=]] for example AFAIK. And UTS#18 more generally doesn't require locale support. UTS#18 Level 3 was actually removed from the spec.
I think UTS#18 is a tortured document, but yes, the regex crate supports pretty much all of UTS#18 Level 1: https://github.com/rust-lang/regex/blob/master/UNICODE.md
Going beyond Level 1 is difficult.