StringZilla
rust-playground
StringZilla | rust-playground | |
---|---|---|
14 | 71 | |
1,811 | 1,174 | |
- | 1.8% | |
9.8 | 9.5 | |
15 days ago | 15 days ago | |
C++ | Rust | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
StringZilla
-
Measuring energy usage: regular code vs. SIMD code
The 3.5x energy-efficiency gap between serial and SIMD code becomes even larger when
A. you do byte-level processing instead of float words;
B. you use embedded, IoT, and other low-energy devices.
A few years ago I've compared Nvidia Jetson Xavier (long before the Orin release), Intel-based MacBook Pro with Core i9, and AVX-512 capable CPUs on substring search benchmarks.
On Xavier one can quite easily disable/enable cores and reconfigure power usage. At peak I got to 4.2 GB/J which was an 8.3x improvement in inefficiency over LibC in substring search operations. The comparison table is still available in the older README: https://github.com/ashvardanian/StringZilla/tree/v2.0.2?tab=...
- Show HN: StringZilla v3 with C++, Rust, and Swift bindings, and AVX-512 and NEON
-
How fast is rolling Karp-Rabin hashing?
This is extremely timely! I was working on SIMD variants for collision-resistant rolling-hash variants in the last few weeks for the v3 release of the StringZilla library [1].
I have tried several 4-way and 8-way parallel variants using AVX-512 DQ instructions for 64-bit integer multiplications [2] as well as using integer FMA instructions on Arm NEON with 32-bit multiplications [3]. The latter needs a better mixing approach to be collision-resistant.
So far I couldn't exceed 1 GB/s/core [4], so more research is needed. If you have any ideas - I am all ears!
[1]: https://github.com/ashvardanian/StringZilla/blob/bc1869a8529...
[2]: https://github.com/ashvardanian/StringZilla/blob/bc1869a8529...
[3]: https://github.com/ashvardanian/StringZilla/blob/bc1869a8529...
[4]: https://github.com/ashvardanian/StringZilla/tree/main-dev?ta...
-
4B If Statements
Jokes aside, lookup tables are a common technique to avoid costly operations. I was recently implementing one to avoid integer division. In my case I knew that the nominator and denominator were 8 bit unsigned integers, so I've replaced the division with 2 table lookups and 6 shifts and arithmetic operations [1]. The well known `libdivide` [2] does that for arbitrary 16, 32, and 64 bit integers, and it has precomputed magic numbers and lookup tables for all 16-bit integers in the same repo.
[1]: https://github.com/ashvardanian/StringZilla/blob/9f6ca3c6d3c...
-
Python, C, Assembly – Faster Cosine Similarity
That matches my experience, and goes beyond GCC and Clang. Between 2018 and 2020 I was giving a lot of lectures on this topic and we did a bunch of case studies with Intel on their older ICC and what later became the OneAPI.
Short story, unless you are doing trivial data-parallel operations, like in SimSIMD, compilers are practically useless. As a proof, I wrote what is now the StringZilla library (https://github.com/ashvardanian/stringzilla) and we've spent weeks with an Intel team, tuning the compiler, no result. So if you are processing a lot of strings, or variable-length coded data, like compression/decompression, hand-written SIMD kernels are pretty much unbeatable.
- Stringzilla: 10x Faster SIMD-accelerated String Class
-
Stringzilla: 10x faster SIMD-accelerated Python `str` class
Blog post
-
Stringzilla: Fastest string sort, search, split, and shuffle using SIMD
Copying my feedback from reddit[1], where I discussed it in the context of the `memchr` crate.[2]
I took a quick look at your library implementation and have some notes:
* It doesn't appear to query CPUID, so I imagine the only way it uses AVX2 on x86-64 is if the user compiles with that feature enabled explicitly. (Or uses something like [`x86-64-v3`](https://en.wikipedia.org/wiki/X86-64#Microarchitecture_level...).) The `memchr` crate doesn't need that. It will use AVX2 even if the program isn't compiled with AVX2 enabled so long as the current CPU supports it.
* Your substring routines have multiplicative worst case (that is, `O(m * n)`) running time. The `memchr` crate only uses SIMD for substring search for smallish needles. Otherwise it flips over to Two-Way with a SIMD prefilter. You'll be fine for short needles, but things could go very very badly for longer needles.
* It seems quite likely that your [confirmation step](https://github.com/ashvardanian/Stringzilla/blob/fab854dc4fd...) is going to absolutely kill performance for even semi-frequently occurring candidates. The [`memchr` crate utilizes information from the vector step to limit where and when it calls `memcmp`](https://github.com/BurntSushi/memchr/blob/46620054ff25b16d22...). Your code might do well in cases where matches are very rare. I took a quick peek at your benchmarks and don't see anything that obviously stresses this particular case. For substring search, the `memchr` crate uses a variant of the "[generic SIMD](http://0x80.pl/articles/simd-strfind.html#first-and-last)" algorithm. Basically, it takes two bytes from the needle, looks for positions where those occur and then attempts to check whether that position corresponds to a match. It looks like your technique uses the first 4 bytes. I suspect that might be overkill. (I did try using 3 bytes from the needle and found that it was a bit slower in some cases.) That is, two bytes is usually enough predictive power to lower the false positive rate enough. Of course, one can write pathological inputs that cause either one to do better than the other. (The `memchr` crat benchmark suite has a [collection of pathological inputs](https://github.com/BurntSushi/memchr/blob/46620054ff25b16d22...).)
It would actually be possible to hook Stringzilla up to `memchr`'s benchmark suite if you were interested. :-)
[1]: https://old.reddit.com/r/rust/comments/163ph8r/memchr_26_now...
[2]: https://github.com/BurntSushi/memchr
-
Show HN: Faking SIMD to Search and Sort Strings 5x Faster
I took a look at Stringzilla (https://github.com/ashvardanian/stringzilla), and in addition to the impressive benchmarks, the API looks pretty straightforward. It's a new star in my collection!
Thanks for open-sourcing this project!
rust-playground
-
Rust: Box Is a Unique Type
If you have an object that's !Unpin, then Miri will not apply uniqueness rules to anything containing it [0], including boxes and &mut references. (In the example code, replacing the PhantomPinned with a () will make Miri complain again.) This is considered a temporary (if long-lived) measure to allow async executors to manipulate pinned futures without invalidating all their references and whatnot. Thus, it might be seen as undetected UB, in lieu of a permanent solution.
[0] https://play.rust-lang.org/?version=stable&mode=debug&editio...
-
Fivefold Slower Compared to Go? Optimizing Rust's Protobuf Decoding Performance
That would be true if you used `Vec::clear` too, it doesn't allocate a new vector. My point was that you still end up running Drop implementations with RepeatedField, just not all at once. See https://play.rust-lang.org/?version=stable&mode=debug&editio...
- Xz: Can you spot the single character that disabled Linux landlock?
-
How to Lose Control of Your Shell
That's a valid Unix path, but rust's quoting does nothing to stop it: https://play.rust-lang.org/?version=stable&mode=debug&editio...
-
Borrow Checking Without Lifetimes
Self-referential structs work fine in Rust and always have.
https://play.rust-lang.org/?version=stable&mode=debug&editio...
The compiler will correctly prevent you from moving the value.
The other way to have a struct that starts out as non-self-referential and then becomes self-referential can be achieved with `unsafe` and `Pin::new_unchecked`, which is how `async {}` is handled.
-
Improving Interoperability Between Rust and C++
In rust as currently stands: https://play.rust-lang.org/?version=stable&mode=debug&editio...
On the other hand, both this wrapper and yours are counterproductive if the element size is dynamic (e.g. perhaps you're dealing with some nonsense like:)
struct ITableColumn {
-
New Linux glibc flaw lets attackers get root on major distros
Overflow checks turn into two's compliments' wrapping, but that's only considered acceptable because bounds checks are not turned off.
https://play.rust-lang.org/?version=stable&mode=release&edit...
-
Atomics and Concurrency
I have no idea what you're talking about, but it sounds unnecessarily complicated and why I don't use Rust for any serious work.
This demonstrates the ABA problem in safe Rust: https://play.rust-lang.org/?version=stable&mode=debug&editio...
Substitute the sleep with a combination of doing computation/work and the OS thread scheduler, and you can see how the bug surfaces.
-
Rust 🦀 Installation + Hello World
You can also try Rust online using the Rust playground: https://play.rust-lang.org/
-
4B If Statements
(Click ... beside build to get assembly) https://play.rust-lang.org/?version=stable&mode=release&edit...
Unfortunately the go playground doesn't seem to support emitting assembly?
What are some alternatives?
usearch - Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍
tokio - A runtime for writing reliable asynchronous applications with Rust. Provides I/O, networking, scheduling, timers, ...
Simd - C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.
trunk - Build, bundle & ship your Rust WASM application to the web.
aho-corasick - A fast implementation of Aho-Corasick in Rust.
mdBook - Create book from markdown files. Like Gitbook but implemented in Rust
rust-memchr - Optimized string search routines for Rust.
Rocket - A web framework for Rust.
popular-baby-names - 1, 000 most popular names for baby boys and girls in CSV and JSON formats. Generator written in Python.
egui - egui: an easy-to-use immediate mode GUI in Rust that runs on both web and native
rebar - A biased barometer for gauging the relative speed of some regex engines on a curated set of tasks.
sqlx - 🧰 The Rust SQL Toolkit. An async, pure Rust SQL crate featuring compile-time checked queries without a DSL. Supports PostgreSQL, MySQL, and SQLite.