Our great sponsors
-
ripgrep
ripgrep recursively searches directories for a regex pattern while respecting your gitignore
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
rebar
A biased barometer for gauging the relative speed of some regex engines on a curated set of tasks.
-
ugrep
NEW ugrep 5.1: an ultra fast, user-friendly, compatible grep. Ugrep combines the best features of other grep, adds new features, and searches fast. Includes a TUI and adds Google-like search, fuzzy search, hexdumps, searches nested archives (zip, 7z, tar, pax, cpio), compressed files (gz, Z, bz2, lzma, xz, lz4, zstd, brotli), pdfs, docs, and more
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
repgrep
An interactive replacer for ripgrep that makes it easy to find and replace across files on the command line.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Another issue with Hyperscan is that if you enable HS_FLAG_UTF8[1], which hypergrep does[2,3], and then search invalid UTF-8, then the result is UB.
> This flag instructs Hyperscan to treat the pattern as a sequence of UTF-8 characters. The results of scanning invalid UTF-8 sequences with a Hyperscan library that has been compiled with one or more patterns using this flag are undefined.
That's another issue you'll need to grapple with if you use Hyperscan. PCRE2 used to have this issue[4], but they've since defined the semantics of searching invalid UTF-8 with Unicode mode enabled. ripgrep 14 uses that new mode, but I haven't updated that FAQ answer yet.
[1]: https://intel.github.io/hyperscan/dev-reference/api_files.ht...
[2]: https://github.com/p-ranav/hypergrep/blob/ee85b713aa84e0050a...
[3]: https://github.com/p-ranav/hypergrep/blob/ee85b713aa84e0050a...
[4]: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#why...
Another issue with Hyperscan is that if you enable HS_FLAG_UTF8[1], which hypergrep does[2,3], and then search invalid UTF-8, then the result is UB.
> This flag instructs Hyperscan to treat the pattern as a sequence of UTF-8 characters. The results of scanning invalid UTF-8 sequences with a Hyperscan library that has been compiled with one or more patterns using this flag are undefined.
That's another issue you'll need to grapple with if you use Hyperscan. PCRE2 used to have this issue[4], but they've since defined the semantics of searching invalid UTF-8 with Unicode mode enabled. ripgrep 14 uses that new mode, but I haven't updated that FAQ answer yet.
[1]: https://intel.github.io/hyperscan/dev-reference/api_files.ht...
[2]: https://github.com/p-ranav/hypergrep/blob/ee85b713aa84e0050a...
[3]: https://github.com/p-ranav/hypergrep/blob/ee85b713aa84e0050a...
[4]: https://github.com/BurntSushi/ripgrep/blob/master/FAQ.md#why...
I'm the author of ripgrep and its regex engine.
Your claim is true to a first approximation. But greps are line oriented, and that means there are optimizations that can be done that are hard to do in a general regex library.
If you read my commentary in the ripgrep discussion above, you'll note that it isn't just about the benchmarks themselves being accurate, but the model they represent. Nevertheless, I linked the hypergrep benchmarks not because of Hyperscan, but because they were done by someone who isn't the author of either ripgrep or ugrep.
As for regex benchmarks, you'll want to check out rebar: https://github.com/BurntSushi/rebar
You can see my full thoughts around benchmark design and philosophy if you read the rebar documentation. Be warned though, you'll need some time.
There is a fork of ripgrep with Hyperscan support: https://sr.ht/~pierrenn/ripgrep/
I think you should try it before you read these benchmarks from the authors: https://github.com/Genivia/ugrep-benchmarks
Good find, thanks! I'll check if I prefer it to moar.
As for bat, according to https://github.com/sharkdp/bat#using-bat-on-windows, the Chocolatey package simply installs `less` alongside `bat`. Seems like a good idea, but I haven't tried it.
So ired is a toy. One wonders how many search results you've missed over the years because of ired's feature "it's so minimal that it's wrong!" I mean sometimes tools have bugs. ripgrep has had bugs too. But this one has been in ired since 2009.
What is it that you said? YIKES. Yeah. Seems appropriate.
[1]: https://github.com/BurntSushi/dotfiles/blob/eace294fd80bfde1...
[2]: https://github.com/radare/ired/blob/a1fa7904e6ad239dde950de5...
Also look at https://github.com/stealth/grab from Sebastian Krahmer.
You can also use fzf with ripgrep to great effect:
[1]: https://github.com/junegunn/fzf/blob/master/ADVANCED.md#usin...
I don't believe bat is a paper; it's more of a pretty-printer that tends to call less.
Two pallets that should work on Windows are https://github.com/walles/moar (golang) and https://github.com/markbt/streampager (Rust). There might also be a newer one that uses rust, I'm unsure.
I don't believe bat is a paper; it's more of a pretty-printer that tends to call less.
Two pallets that should work on Windows are https://github.com/walles/moar (golang) and https://github.com/markbt/streampager (Rust). There might also be a newer one that uses rust, I'm unsure.
So ired is a toy. One wonders how many search results you've missed over the years because of ired's feature "it's so minimal that it's wrong!" I mean sometimes tools have bugs. ripgrep has had bugs too. But this one has been in ired since 2009.
What is it that you said? YIKES. Yeah. Seems appropriate.
[1]: https://github.com/BurntSushi/dotfiles/blob/eace294fd80bfde1...
[2]: https://github.com/radare/ired/blob/a1fa7904e6ad239dde950de5...