The Boyer-Moore Fast String Searching Algorithm

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

WorkOS - The modern identity platform for B2B SaaS

The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

workos.com

featured

regex

91 3,345 9.1 Rust

An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

It does actually! But only in some cases. See https://github.com/BurntSushi/ripgrep/issues/617#issuecommen... and https://github.com/rust-lang/regex/issues/408 and https://github.com/rust-lang/regex/pull/419
Note though that it is going away soon. It's going to be replaced with a combination of Two-Way, Rabin-Karp and a fast vectorized prefilter based on this: http://0x80.pl/articles/simd-strfind.html#algorithm-1-generi...
The "secret sauce" is that in the vectorized prefilter, we don't just choose the first and last bytes. Instead, we pick bytes that we think will be rare based on a priori assumption of relative byte frequencies.

ripgrep

348 44,901 9.3 Rust

ripgrep recursively searches directories for a regex pattern while respecting your gitignore

It does actually! But only in some cases. See https://github.com/BurntSushi/ripgrep/issues/617#issuecommen... and https://github.com/rust-lang/regex/issues/408 and https://github.com/rust-lang/regex/pull/419
Note though that it is going away soon. It's going to be replaced with a combination of Two-Way, Rabin-Karp and a fast vectorized prefilter based on this: http://0x80.pl/articles/simd-strfind.html#algorithm-1-generi...
The "secret sauce" is that in the vectorized prefilter, we don't just choose the first and last bytes. Instead, we pick bytes that we think will be rare based on a priori assumption of relative byte frequencies.

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
rust-memchr

29 758 7.7 Rust

Optimized string search routines for Rust.

> I'm curious how you can speed up finding a single byte? Don't you have to touch all bytes at that point? Leaning heavily on simd instructions?
Yes. That's what memchr does: https://github.com/BurntSushi/rust-memchr/blob/427fdc384007d...
> And I realize newlines is only to track line counts. Am I wrong that that slows things down?
It does, but only a little if it's implemented using vector instructions. You can try with ripgrep using -N/-n. GNU grep's line counting isn't vectorized IIRC, so you might notice bigger differences there.

fast_strstr

1 114 0.0 C

A fast substitution to the stdlib's strstr() sub-string search function.

https://github.com/RaphaelJ/fast_strstr/tree/master/benchmar...
While the algorithm is time linear (Boyer-Moore is sub linear), it ends up being significantly faster on textual content as the per character operations are significantly simplier.

fastsearch

1 0 0.0 Pascal

Implements a search algorithm I invented decades ago in 64 bit pascal

I went through the work, and re-implemented it as an assembly language routine inside a 64bit free pascal program.
It searches about 1 gigabyte of text in 2 seconds.
https://github.com/mikewarot/fastsearch/blob/master/fastsear...

smart

6 93 0.0 JavaScript

String Matching Algorithms Research Tool (by smart-tool)

In the libc's not, indeed. They are still in the stoneage of string search.
But the fastest is currently EPSM, by S. Faro and O. M. Kulekci. See https://smart-tool.github.io/smart/
"Exact Packed String Matching" optimized for SIMD SSE4.2/AVX (x86_64 and aarch64). It performs stable and best on all sizes.
The site I linked to compares 199 fast string search algorithms, with the usual ones (BM, KMP, BMH) being pretty slow. EPSM outperforms all the others being mentioned here on these platforms. It's also the latest.

boyermoore

2 55 0.0 Go

Implementation of Boyer-Moore fast string search algorithm in Go

I implemented it in Go for fun. I expected Go's standard lib will beat it but it perform around 10 times faster than `strings.Index`.
https://github.com/sarpdag/boyermoore
Go's strings package depends on the substring size tries different approaches like Rabin Karp algorithm, (https://en.wikipedia.org/wiki/Rabin%E2%80%93Karp_algorithm)

WorkOS

workos.com featured

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Website Question

3 projects | /r/nfl | 15 Nov 2022
Be careful of the examples you use. They stick

2 projects | news.ycombinator.com | 22 Aug 2023
How do I learn grep in detail. I work at support and day to day need to have to deal with heavy logs. I have seen people playing with operators and other stuff to get the desired output. Someone was using Sed also. Please help.🙏

2 projects | /r/linuxquestions | 20 Mar 2022
Ripgrep

1 project | news.ycombinator.com | 25 Feb 2024
CryptoFlow: Building a secure and scalable system with Axum and SvelteKit - Part 3

3 projects | dev.to | 8 Jan 2024

The Boyer-Moore Fast String Searching Algorithm

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Text processing Regex Applications written in Rust RegExp Ripgrep
Post date: 23 Apr 2021

regex

ripgrep

InfluxDB

rust-memchr

fast_strstr

fastsearch

smart

boyermoore

WorkOS

Related posts

Website Question

Be careful of the examples you use. They stick

How do I learn grep in detail. I work at support and day to day need to have to deal with heavy logs. I have seen people playing with operators and other stuff to get the desired output. Someone was using Sed also. Please help.🙏

Ripgrep

CryptoFlow: Building a secure and scalable system with Axum and SvelteKit - Part 3

The Boyer-Moore Fast String Searching Algorithm

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Text processing Regex Applications written in Rust RegExp Ripgrep Post date: 23 Apr 2021

Related posts

Website Question

Be careful of the examples you use. They stick

How do I learn grep in detail. I work at support and day to day need to have to deal with heavy logs. I have seen people playing with operators and other stuff to get the desired output. Someone was using Sed also. Please help.🙏

Ripgrep

CryptoFlow: Building a secure and scalable system with Axum and SvelteKit - Part 3

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Text processing Regex Applications written in Rust RegExp Ripgrep
Post date: 23 Apr 2021