rust-memchr vs .NET Runtime

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

rust-memchr		.NET Runtime
	Project
29	Mentions	608
758	Stars	14,139
-	Growth	1.6%
7.7	Activity	10.0
about 1 month ago	Latest Commit	2 days ago
Rust	Language	C#
The Unlicense	License	MIT License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

rust-memchr

Posts with mentions or reviews of rust-memchr. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-26.

Memchr: Optimized string search routines for Rust
1 project | news.ycombinator.com | 13 Jan 2024
Ask HN: What's the fastest programming language with a large standard library?
9 projects | news.ycombinator.com | 26 Dec 2023

That's what the `memchr` crate does. It uses `vshrn` just like in your links. And vpmaxq before even bothering with vshrn: https://github.com/BurntSushi/memchr/blob/c6b885b870b6f1b9bf...
Rust-Cache
1 project | news.ycombinator.com | 4 Dec 2023

I agree with everything you said, but I don't see how it leads the OP's formulation being silly or wrong or not useful. Here's another example (of my own) where you can pick an upper bound for `n` and base your complexity analysis around it. In this case, we're trying to provide an API guarantee that a search takes O(m+n) time despite wanting to use an O(mn) algorithm in some subset of cases. We can still meet the O(m+n) bound by reasoning that the O(mn) algorithm is only used in a finite set of cases, and thus collapses to O(1) time. Therefore, the O(m+n) time bound is preserved. And this isn't a trick either. That really is the scaling behavior of the implementation. See: https://github.com/BurntSushi/memchr/blob/ce7b8e606410f6c81a...
> If an algorithm is defined as being unscalable (fixed input), what sense does it make to describe that it scales constantly with input size?
I'll answer your question with another: in what cases does it make sense to describe the scaling behavior of algorithm with O(1)?
Rust memchr adds aarch64 SIMD with impressive speedups
1 project | news.ycombinator.com | 29 Aug 2023
Stringzilla: Fastest string sort, search, split, and shuffle using SIMD
9 projects | news.ycombinator.com | 29 Aug 2023

Copying my feedback from reddit[1], where I discussed it in the context of the `memchr` crate.[2]
I took a quick look at your library implementation and have some notes:
* It doesn't appear to query CPUID, so I imagine the only way it uses AVX2 on x86-64 is if the user compiles with that feature enabled explicitly. (Or uses something like [`x86-64-v3`](https://en.wikipedia.org/wiki/X86-64#Microarchitecture_level...).) The `memchr` crate doesn't need that. It will use AVX2 even if the program isn't compiled with AVX2 enabled so long as the current CPU supports it.
* Your substring routines have multiplicative worst case (that is, `O(m * n)`) running time. The `memchr` crate only uses SIMD for substring search for smallish needles. Otherwise it flips over to Two-Way with a SIMD prefilter. You'll be fine for short needles, but things could go very very badly for longer needles.
* It seems quite likely that your [confirmation step](https://github.com/ashvardanian/Stringzilla/blob/fab854dc4fd...) is going to absolutely kill performance for even semi-frequently occurring candidates. The [`memchr` crate utilizes information from the vector step to limit where and when it calls `memcmp`](https://github.com/BurntSushi/memchr/blob/46620054ff25b16d22...). Your code might do well in cases where matches are very rare. I took a quick peek at your benchmarks and don't see anything that obviously stresses this particular case. For substring search, the `memchr` crate uses a variant of the "[generic SIMD](http://0x80.pl/articles/simd-strfind.html#first-and-last)" algorithm. Basically, it takes two bytes from the needle, looks for positions where those occur and then attempts to check whether that position corresponds to a match. It looks like your technique uses the first 4 bytes. I suspect that might be overkill. (I did try using 3 bytes from the needle and found that it was a bit slower in some cases.) That is, two bytes is usually enough predictive power to lower the false positive rate enough. Of course, one can write pathological inputs that cause either one to do better than the other. (The `memchr` crat benchmark suite has a [collection of pathological inputs](https://github.com/BurntSushi/memchr/blob/46620054ff25b16d22...).)
It would actually be possible to hook Stringzilla up to `memchr`'s benchmark suite if you were interested. :-)
[1]: https://old.reddit.com/r/rust/comments/163ph8r/memchr_26_now...
[2]: https://github.com/BurntSushi/memchr
Ripgrep now twice as fast on Apple Silicon with new aarch64 SIMD implementations
1 project | news.ycombinator.com | 28 Aug 2023
Regex Engine Internals as a Library
5 projects | news.ycombinator.com | 5 Jul 2023

The current PR for ARM SIMD[1] uses a different instruction mix to achieve the same goals as movemask. I tested the PR and it has a significant speedup over the non-vectorized version.
[1]https://github.com/BurntSushi/memchr/pull/114
Sneller Regex vs Ripgrep
3 projects | news.ycombinator.com | 18 May 2023

And that is the primary reason why ripgrep doesn't bother with AVX-512. Not because of some lack of skill as this blog suggests:
> Additionally, ripgrep uses AVX2 and does not take advantage of AVX-512 instruction sets, but this can be forgiven given the specialized skills required for handcrafting for SkylakeX and Icelake/Zen4 processors.
Namely, I tried running sneller on my CPU, which is a pretty recent i9-12900K, and not even it supports AVX-512. That's because Intel has been dropping support for AVX-512 from its more recent consumer grade CPUs. ripgrep is running far more frequently on consumer grade CPUs, so supporting AVX-512 is probably not particularly advantageous. At least, it's not obvious to me that it's worth doing. And certainly, the skill argument isn't entirely wrong. I'd have to invest developer time to make it work.
I think there are two other things worth highlighting from this blog.
First is that sneller seems to do quite well with compressed data. This is definitely not ripgrep's strong suit. When you use ripgrep's -z/--search-zip flag, all it's doing is shelling out to your gzip/xz/whatever executable to do the decompression work, which is then streamed into ripgrep for searching. So if your search speed tanks when using -z/--search-zip, it's likely because your decompression tools are slow, not because of ripgrep. But it's a fair comparison from sneller's perspective, because it seems to integrate the two.
Second is the issue of multi-threaded search. In ripgrep, the fundamental unit of work is "search a file." ripgrep has no support for more granular parallelism. That is, if you give it one file, it's limited to doing a single threaded search. ripgrep could do more granular parallelism, but it hasn't been obviously worth it to me. If most searches are on a directory tree, then parallelizing at the level of each file is almost certainly good enough. Making ripgrep's parallelism more fine grained is a fair bit of work too, and there would be a lot of fiddly stuff to get right. If I could run sneller easily, I'd probably try to see how it does in a more varied workload than what is presented in this blog. :-)
And finally some corrections:
> However, when using a single thread, ripgrep appears to be slightly faster.
Not just slightly faster, over 2x faster!
The single threaded results for Regex2 and Regex3 for Sneller are quite nice! I'd be interested in hearing more about what you're doing in the Regex2 case, since Sneller and ripgrep are about on par with the Regex3 case. Maybe a fail fast optimization?
> The reason for this is that ripgrep uses the Boyer-Moore string search algorithm, which is a pattern matching algorithm that is often used for searching for substrings within larger strings. It is particularly efficient when the pattern being searched for is relatively long and the alphabet of characters being searched over is relatively small. Sneller does not use this substring search algorithm and as a result is slower than ripgrep with substrings. However, when long substrings are not present, Sneller outperforms ripgrep.
ripgrep has never used Boyer-Moore. (Okay, some years ago, ripgrep could use Boyer-Moore in certain niche cases. But that hasn't been the case for a while and it was never the thing most commonly used). What ripgrep uses today is succinctly described here: https://github.com/BurntSushi/memchr#algorithms-used (But it has always eschewed algorithms like Boyer-Moore in favor of more heuristic-y approaches based on a background frequency distribution of bytes.)
I think I would also contest the claim that "long substrings" are the key here. ripgrep is plenty fast with short substrings too. You're correct that if you have no literals then ripgrep will get slower because it has to fall back to the regex engine. But I'd like to see more robust benchmarks there. Your Regex2 and Regex3 benchmarks raise more questions than it answers. :-)
> Although the resulting .dot and .svg files may be somewhat clunky, we can still observe from the graph that the number of nodes and edges are small enough to use the branchless IceLake implementation. In this particular case, we only need 8 bits to encode the number of nodes and the number of distinct edges, enabling the tool to use (what we call) the 8-bit DFA implementation. For more details on how this works, see our post on regex implementations.
So this is talking about the DFA graph for the regex `Sherlock [A-Z]\w+`. It's important to point out that, in ripgrep, `\w` is Unicode aware by default. Which makes it absolutely enormous. So I think the state graph you linked is probably only for the ASCII version of that regex.
Indeed, reading your regex blog[1], it perhaps looks like a lot of the tricks you use won't work for Unicode, because Unicode tends to blow up finite automata.
If I could run Sneller, I'd probably try to poke it to see what its Unicode support looks like. From a quick glance of the source code, it also looks like you build full DFAs. So I would also try to poke it to see what happens when handed a particularly a not-so-small regex. (Building a DFA can take quite some time.)
Ah okay, I see, you put a max limit on the DFA: https://github.com/SnellerInc/sneller/blob/bb5adec564bf9869d...
Overall this is a very cool project!
[1]: https://sneller.io/blog/accelerating-regex-using-avx-512/
SIMD with Zig
6 projects | news.ycombinator.com | 1 May 2023

Indeed. This is how ripgrep works. It's compiled for just plain `x86_64`, but it looks for whether things like AVX2 are enabled. And if so, uses vector algorithms for substring and multi-substring search. The nice thing about dealing with strings is that the "coarse" requirement is already somewhat natural to the domain.
But, this functionality is absolutely critical. It doesn't even have to be automatic. Just the ability to compile functions with certain ISA extensions enabled, and then only call them when the requisite CPU features are enabled is enough.
In a nutshell: https://github.com/BurntSushi/memchr/blob/8037d11b4357b0f07b...
Tree Borrows - A new aliasing model for Rust
6 projects | /r/rust | 28 Mar 2023

/u/nvanille Excellent work. Evaluating whether this all makes sense is very much above my pay-grade, but I'm all in favor of making it harder to shoot yourself in the foot with unsafe. This actually happened with the memchr crate, if you're interested in those details.

.NET Runtime

Posts with mentions or reviews of .NET Runtime. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-04-22.

Airline keeps mistaking 101-year-old woman for baby
1 project | news.ycombinator.com | 28 Apr 2024

It's an interesting "time is a circle" problem given that a century only has 100 years and then we loop around again. 2-digit years is convenient for people in many situations but they are very lossy, and horrible for machines.
It reminds me of this breaking change to .Net from last year.[1][2] Maybe AA just needs to update .Net which would pad them out until the 2050's when someone born in the 1950s would be having...exactly the same problem in the article. (It is configurable now so you could just keep pushing it each decade, until it wraps again).
Or they could use 4-digit years.
[1] https://github.com/dotnet/runtime/issues/75148
The software industry rapidly convergng on 3 languages: Go, Rust, and JavaScript
1 project | news.ycombinator.com | 24 Apr 2024

These can also be passed as arguments to `dotnet publish` if necessary.
Reference:
- https://learn.microsoft.com/en-us/dotnet/core/deploying/nati...
- https://github.com/dotnet/runtime/blob/main/src/coreclr/nati...
- https://github.com/dotnet/runtime/blob/5b4e770daa190ce69f402... (full list of recognized keys for IlcInstructionSet)
The Performance Impact of C++'s `final` Keyword
6 projects | news.ycombinator.com | 22 Apr 2024

Yes, that is true. I'm not sure about JVM implementation details but the reason the comment says "virtual and interface" calls is to outline the difference. Virtual calls in .NET are sufficiently close[0] to virtual calls in C++. Interface calls, however, are coded differently[1].
Also you are correct - virtual calls are not terribly expensive, but they encroach on ever limited* CPU resources like indirect jump and load predictors and, as noted in parent comments, block inlining, which is highly undesirable for small and frequently called methods, particularly when they are in a loop.
* through great effort of our industry to take back whatever performance wins each generation brings with even more abstractions that fail to improve our productivity
[0] https://github.com/dotnet/coreclr/blob/4895a06c/src/vm/amd64...
[1] https://github.com/dotnet/runtime/blob/main/docs/design/core... (mind you, the text was initially written 18 ago, wow)
Java 23: The New Features Are Officially Announced
5 projects | news.ycombinator.com | 17 Apr 2024

If you care about portable SIMD and performance, you may want to save yourself trouble and skip to C# instead, it also has an extensive guide to using it: https://github.com/dotnet/runtime/blob/69110bfdcf5590db1d32c...
CoreLib and many new libraries are using it heavily to match performance of manually intensified C++ code.
Locally test and validate your Renovate configuration files
4 projects | dev.to | 9 Apr 2024

DEBUG: packageFiles with updates (repository=local) "config": { "nuget": [ { "deps": [ { "datasource": "nuget", "depType": "nuget", "depName": "Microsoft.Extensions.Hosting", "currentValue": "7.0.0", "updates": [ { "bucket": "non-major", "newVersion": "7.0.1", "newValue": "7.0.1", "releaseTimestamp": "2023-02-14T13:21:52.713Z", "newMajor": 7, "newMinor": 0, "updateType": "patch", "branchName": "renovate/dotnet-monorepo" }, { "bucket": "major", "newVersion": "8.0.0", "newValue": "8.0.0", "releaseTimestamp": "2023-11-14T13:23:17.653Z", "newMajor": 8, "newMinor": 0, "updateType": "major", "branchName": "renovate/major-dotnet-monorepo" } ], "packageName": "Microsoft.Extensions.Hosting", "versioning": "nuget", "warnings": [], "sourceUrl": "https://github.com/dotnet/runtime", "registryUrl": "https://api.nuget.org/v3/index.json", "homepage": "https://dot.net/", "currentVersion": "7.0.0", "isSingleVersion": true, "fixedVersion": "7.0.0" } ], "packageFile": "RenovateDemo.csproj" } ] }
Chrome Feature: ZSTD Content-Encoding
10 projects | news.ycombinator.com | 1 Apr 2024

https://github.com/dotnet/runtime/issues/59591
Support zstd Content-Encoding:
Writing x86 SIMD using x86inc.asm (2017)
3 projects | news.ycombinator.com | 26 Mar 2024
Why choose async/await over threads?
11 projects | news.ycombinator.com | 25 Mar 2024

We might not be that far away already. There is this issue[1] on Github, where Microsoft and the community discuss some significant changes.
There is still a lot of questions unanswered, but initial tests look promising.
Ref: https://github.com/dotnet/runtime/issues/94620
Redis License Changed
11 projects | news.ycombinator.com | 20 Mar 2024

https://github.com/dotnet/dotnet exists for source build that stitches together SDK, Roslyn, runtime and other dependencies. A lot of them can be built and used individually, which is what contributors usually do. For example, you can clone and build https://github.com/dotnet/runtime and use the produced artifacts to execute .NET assemblies or build .NET binaries.
Garnet – A new remote cache-store from Microsoft Research
6 projects | news.ycombinator.com | 18 Mar 2024

Yeah, it kind of is. There are quite a few of experiments that are conducted to see if they show promise in the prototype form and then are taken further for proper integration if they do.
Unfortunately, object stack allocation was not one of them even though DOTNET_JitObjectStackAllocation configuration knob exists today, enabling it makes zero impact as it almost never kicks in. By the end of the experiment[0], it was concluded that before investing effort in this kind of feature becomes profitable given how a lot of C# code is written, there are many other lower hanging fruits.
To contrast this, in continuation to green threads experiment, a runtime handled tasks experiment[1] which moves async state machine handling from IL emitted by Roslyn to special-cased methods and then handling purely in runtime code has been a massive success and is now being worked on to be integrated in one of the future version of .NET (hopefully 10?)
[0] https://github.com/dotnet/runtime/issues/11192
[1] https://github.com/dotnet/runtimelab/blob/feature/async2-exp...

What are some alternatives?

When comparing rust-memchr and .NET Runtime you can also consider the following projects:

thefuck - Magnificent app which corrects your previous console command.

Ryujinx - Experimental Nintendo Switch Emulator written in C#

htop - htop - an interactive process viewer

ASP.NET Core - ASP.NET Core is a cross-platform .NET framework for building modern cloud-based web applications on Windows, Mac, or Linux.

duf - Disk Usage/Free Utility - a better 'df' alternative

actix-web - Actix Web is a powerful, pragmatic, and extremely fast web framework for Rust.

bottom - Yet another cross-platform graphical process/system monitor.

WASI - WebAssembly System Interface

fzf - :cherry_blossom: A command-line fuzzy finder

CoreCLR - CoreCLR is the runtime for .NET Core. It includes the garbage collector, JIT compiler, primitive data types and low-level classes.

regex - An implementation of regular expressions for Rust. This implementation uses finite automata and guarantees linear time matching on all inputs.

vgpu_unlock - Unlock vGPU functionality for consumer grade GPUs.

rust-memchr vs thefuck .NET Runtime vs Ryujinx rust-memchr vs htop .NET Runtime vs ASP.NET Core rust-memchr vs duf .NET Runtime vs actix-web rust-memchr vs bottom .NET Runtime vs WASI rust-memchr vs fzf .NET Runtime vs CoreCLR rust-memchr vs regex .NET Runtime vs vgpu_unlock

Compare rust-memchr vs .NET Runtime and see what are their differences.

rust-memchr

.NET Runtime

rust-memchr

.NET Runtime

What are some alternatives?