Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
oil
Oils is our upgrade path from bash to a better language and runtime. It's also for Python and JavaScript users who avoid shell!
comparitor.insert("iyr", Regex::new(r"^(201[0-9]|2020)$").unwrap());
There are lots of number parsing.
I would enable both [[:1-12:]] and [[:01-12:]] as options without / with leading zeros.
About the variables:
This file would look much more readable with variables that are reusing other regexes:
https://github.com/spcan/common-regex/blob/3238bc8ee85e0e000...
I find something like this a lot more readable:
https://github.com/jkrumbiegel/ReadableRegex.jl
It is in Julia, but if you have it installed locally it’s just a few taps away. You can even generate the regex, and use that in Python and just add the ReadableRegex in a comment nearby.
I had a similar kind of idea for a long time, which I put into action a few weeks ago via a standalone transpiler of Emacs' rx macro to common regexp syntaxes.[0] I ended up getting interrupted and didn't completely finish it, but it generally works, though is probably riddled with edge cases.
The basic idea of rx is to use S-expressions to describe regular expressions, and my elevator pitch would've been to embed rx invocations in shell scripts using $(syntax), the main use case being something like sed invocations.
I still think it's a neat idea, and complex regular expressions tend to be hard to parse for humans.
Why don't languages have grok patterns in their standard libraries?
It seems to only exist in log parsing ecosystems but this really helps with getting rid of little bugs and wrong parsing of specific regex patterns.
Instead of doing "^\d+(\.\d+){3}$" for IP checking which is clearly wrong, you'd do "%{IPV4:ip}" which is so much better.
List of known patterns : https://github.com/hpcugent/logstash-patterns/blob/master/fi...
Even for PHP a third party library only has 15 stars.
I agree with you. I got tired of fighting with regex where I got to the point of simply not using it if at all possible.
A comment further up offered a very promising alternative.
https://github.com/VerbalExpressions/JSVerbalExpressions#tes...
It's a bit verbose, but I don't care anymore, I am too much a veteran to care about my code being sleek, I want it readable and workable.
The fundamental problem comes from assigning meaning to whitespace (in this case, concatenation). I had the same issues when developing KBNF ( https://github.com/kstenerud/kbnf/blob/master/kbnf.md ) which operates in a closely related space.
In early development, I took a number of cues from regex that turned out to be bad ideas, in particular using whitespace for concatenation (which all BNF dialects seem to do).
Switching to '&' for concatenation fixed it and made things a lot clearer, as it would also do for Pomsky:
'Hello' & ' '+ & ('world' | 'pomsky')
It may have originated from a project but many other log parser projects such as Vector and fluentd have such support.
FWIW here is a list of other such projects: https://github.com/oilshell/oil/wiki/Alternative-Regex-Synta...