How can I efficiently search for a specific string in a large text file using C#?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

ripgrep

348 44,901 9.3 Rust

ripgrep recursively searches directories for a regex pattern while respecting your gitignore

Right. The "generic SIMD" algorithm is one I'm quite familiar with and have implemented. It's what ripgrep uses for example, although it's a little smarter than "just take the first and last bytes." ripgrep tries to guess at which bytes are the best to pick to maximize throughput by reducing false positives in the initial candidate scan. You can see the implementation here: https://github.com/BurntSushi/memchr/tree/master/src/memmem

Apache Lucene

7 2,147 8.2 C#

Apache Lucene.NET
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
StreamRegex

1 9 6.2 C#

A .NET Standard 2.1+ Library to perform string parsing operations on Streams and StreamReaders. Includes Extensions for Regex.
nlc

2 9 10.0 C#

Line counter written in C# targeting .NET 6
fnlc

2 12 1.3 C#

A line-counter written in C# and using Intrinsics
Microsoft.IO.RecyclableMemoryStream

10 1,887 7.4 C#

A library to provide pooling for .NET MemoryStream objects to improve application performance.

Another suggestion to try, there is a tool provided by Microsoft called Microsoft.IO.RecyclableMemoryStream which greatly reduces the amount of memory to garbage collect when streaming large amounts of data.

rust-memchr

29 754 7.9 Rust

Optimized string search routines for Rust.

Right. The "generic SIMD" algorithm is one I'm quite familiar with and have implemented. It's what ripgrep uses for example, although it's a little smarter than "just take the first and last bytes." ripgrep tries to guess at which bytes are the best to pick to maximize throughput by reducing false positives in the initial candidate scan. You can see the implementation here: https://github.com/BurntSushi/memchr/tree/master/src/memmem

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project