StringZilla VS highway

Compare StringZilla vs highway and see what are their differences.

StringZilla

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc πŸ¦– (by ashvardanian)
InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
StringZilla highway
14 66
1,811 3,665
- 2.4%
9.8 9.8
15 days ago 5 days ago
C++ C++
Apache License 2.0 Apache License 2.0
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

StringZilla

Posts with mentions or reviews of StringZilla. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-27.
  • Measuring energy usage: regular code vs. SIMD code
    1 project | news.ycombinator.com | 19 Feb 2024
    The 3.5x energy-efficiency gap between serial and SIMD code becomes even larger when

    A. you do byte-level processing instead of float words;

    B. you use embedded, IoT, and other low-energy devices.

    A few years ago I've compared Nvidia Jetson Xavier (long before the Orin release), Intel-based MacBook Pro with Core i9, and AVX-512 capable CPUs on substring search benchmarks.

    On Xavier one can quite easily disable/enable cores and reconfigure power usage. At peak I got to 4.2 GB/J which was an 8.3x improvement in inefficiency over LibC in substring search operations. The comparison table is still available in the older README: https://github.com/ashvardanian/StringZilla/tree/v2.0.2?tab=...

  • Show HN: StringZilla v3 with C++, Rust, and Swift bindings, and AVX-512 and NEON
    1 project | news.ycombinator.com | 7 Feb 2024
  • How fast is rolling Karp-Rabin hashing?
    1 project | news.ycombinator.com | 4 Feb 2024
    This is extremely timely! I was working on SIMD variants for collision-resistant rolling-hash variants in the last few weeks for the v3 release of the StringZilla library [1].

    I have tried several 4-way and 8-way parallel variants using AVX-512 DQ instructions for 64-bit integer multiplications [2] as well as using integer FMA instructions on Arm NEON with 32-bit multiplications [3]. The latter needs a better mixing approach to be collision-resistant.

    So far I couldn't exceed 1 GB/s/core [4], so more research is needed. If you have any ideas - I am all ears!

    [1]: https://github.com/ashvardanian/StringZilla/blob/bc1869a8529...

    [2]: https://github.com/ashvardanian/StringZilla/blob/bc1869a8529...

    [3]: https://github.com/ashvardanian/StringZilla/blob/bc1869a8529...

    [4]: https://github.com/ashvardanian/StringZilla/tree/main-dev?ta...

  • 4B If Statements
    5 projects | news.ycombinator.com | 27 Dec 2023
    Jokes aside, lookup tables are a common technique to avoid costly operations. I was recently implementing one to avoid integer division. In my case I knew that the nominator and denominator were 8 bit unsigned integers, so I've replaced the division with 2 table lookups and 6 shifts and arithmetic operations [1]. The well known `libdivide` [2] does that for arbitrary 16, 32, and 64 bit integers, and it has precomputed magic numbers and lookup tables for all 16-bit integers in the same repo.

    [1]: https://github.com/ashvardanian/StringZilla/blob/9f6ca3c6d3c...

  • Python, C, Assembly – Faster Cosine Similarity
    5 projects | news.ycombinator.com | 18 Dec 2023
    That matches my experience, and goes beyond GCC and Clang. Between 2018 and 2020 I was giving a lot of lectures on this topic and we did a bunch of case studies with Intel on their older ICC and what later became the OneAPI.

    Short story, unless you are doing trivial data-parallel operations, like in SimSIMD, compilers are practically useless. As a proof, I wrote what is now the StringZilla library (https://github.com/ashvardanian/stringzilla) and we've spent weeks with an Intel team, tuning the compiler, no result. So if you are processing a lot of strings, or variable-length coded data, like compression/decompression, hand-written SIMD kernels are pretty much unbeatable.

  • Stringzilla: 10x Faster SIMD-accelerated String Class
    1 project | /r/programming | 30 Aug 2023
  • Stringzilla: 10x faster SIMD-accelerated Python `str` class
    2 projects | /r/Python | 30 Aug 2023
    Blog post
  • Stringzilla: Fastest string sort, search, split, and shuffle using SIMD
    9 projects | news.ycombinator.com | 29 Aug 2023
    Copying my feedback from reddit[1], where I discussed it in the context of the `memchr` crate.[2]

    I took a quick look at your library implementation and have some notes:

    * It doesn't appear to query CPUID, so I imagine the only way it uses AVX2 on x86-64 is if the user compiles with that feature enabled explicitly. (Or uses something like [`x86-64-v3`](https://en.wikipedia.org/wiki/X86-64#Microarchitecture_level...).) The `memchr` crate doesn't need that. It will use AVX2 even if the program isn't compiled with AVX2 enabled so long as the current CPU supports it.

    * Your substring routines have multiplicative worst case (that is, `O(m * n)`) running time. The `memchr` crate only uses SIMD for substring search for smallish needles. Otherwise it flips over to Two-Way with a SIMD prefilter. You'll be fine for short needles, but things could go very very badly for longer needles.

    * It seems quite likely that your [confirmation step](https://github.com/ashvardanian/Stringzilla/blob/fab854dc4fd...) is going to absolutely kill performance for even semi-frequently occurring candidates. The [`memchr` crate utilizes information from the vector step to limit where and when it calls `memcmp`](https://github.com/BurntSushi/memchr/blob/46620054ff25b16d22...). Your code might do well in cases where matches are very rare. I took a quick peek at your benchmarks and don't see anything that obviously stresses this particular case. For substring search, the `memchr` crate uses a variant of the "[generic SIMD](http://0x80.pl/articles/simd-strfind.html#first-and-last)" algorithm. Basically, it takes two bytes from the needle, looks for positions where those occur and then attempts to check whether that position corresponds to a match. It looks like your technique uses the first 4 bytes. I suspect that might be overkill. (I did try using 3 bytes from the needle and found that it was a bit slower in some cases.) That is, two bytes is usually enough predictive power to lower the false positive rate enough. Of course, one can write pathological inputs that cause either one to do better than the other. (The `memchr` crat benchmark suite has a [collection of pathological inputs](https://github.com/BurntSushi/memchr/blob/46620054ff25b16d22...).)

    It would actually be possible to hook Stringzilla up to `memchr`'s benchmark suite if you were interested. :-)

    [1]: https://old.reddit.com/r/rust/comments/163ph8r/memchr_26_now...

    [2]: https://github.com/BurntSushi/memchr

  • Show HN: Faking SIMD to Search and Sort Strings 5x Faster
    1 project | news.ycombinator.com | 26 Aug 2023
    I took a look at Stringzilla (https://github.com/ashvardanian/stringzilla), and in addition to the impressive benchmarks, the API looks pretty straightforward. It's a new star in my collection!

    Thanks for open-sourcing this project!

highway

Posts with mentions or reviews of highway. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2024-03-31.
  • Llamafile 0.7 Brings AVX-512 Support: 10x Faster Prompt Eval Times for AMD Zen 4
    3 projects | news.ycombinator.com | 31 Mar 2024
    The bf16 dot instruction replaces 6 instructions: https://github.com/google/highway/blob/master/hwy/ops/x86_12...
  • JPEG XL and the Pareto Front
    9 projects | news.ycombinator.com | 1 Mar 2024
    [0] for those interested in Highway.

    It's also mentioned in [1], which starts off

    > Today we're sharing open source code that can sort arrays of numbers about ten times as fast as the C++ std::sort, and outperforms state of the art architecture-specific algorithms, while being portable across all modern CPU architectures. Below we discuss how we achieved this.

    [0] https://github.com/google/highway

    [1] https://opensource.googleblog.com/2022/06/Vectorized%20and%2..., which has an associated paper at https://arxiv.org/pdf/2205.05982.pdf.

  • Gemma.cpp: lightweight, standalone C++ inference engine for Gemma models
    7 projects | news.ycombinator.com | 23 Feb 2024
    Thanks so much!

    Everyone working on this self-selected into contributing, so I think of it less as my team than ... a team?

    Specifically want to call out: Jan Wassenberg (author of https://github.com/google/highway) and I started gemma.cpp as a small project just a few months ago + Phil Culliton, Dan Zheng, and Paul Chang + of course the GDM Gemma team.

  • From slow to SIMD: A Go optimization story
    10 projects | news.ycombinator.com | 23 Jan 2024
    C++ users can enjoy Highway [1].

    [1] https://github.com/google/highway/

  • GDlog: A GPU-Accelerated Deductive Engine
    16 projects | news.ycombinator.com | 3 Dec 2023
  • Designing a SIMD Algorithm from Scratch
    3 projects | news.ycombinator.com | 28 Nov 2023
    At that point it is better to have some kind of DSL that should not be in the main language, because it would target a much lower level than a typical program. The best effort I've seen in this scene was Google's Highway [1] (not to be confused with HighwayHash) and I even once attempted to recreate it in Rust, but it is still distanced from my ideal.

    [1] https://github.com/google/highway

  • SIMD Everywhere Optimization from ARM Neon to RISC-V Vector Extensions
    6 projects | news.ycombinator.com | 29 Sep 2023
    Interesting, thanks for sharing :)

    At the time we open-sourced Highway, the standardization process had already started and there were some discussions.

    I'm curious why stdlib is the only path you see to default? Compare the activity level of https://github.com/VcDevel/std-simd vs https://github.com/google/highway. As to open-source usage, after years of std::experimental, I see <200 search hits [1], vs >400 for Highway [2], even after excluding several library users.

    But that aside, I'm not convinced standardization is the best path for a SIMD library. We and external users extend Highway on a weekly basis as new use cases arise. What if we deferred those changes to 3-monthly meetings, or had to wait for one meeting per WD, CD, (FCD), DIS, (FDIS) stage before it's standardized? Standardization seems more useful for rarely-changing things.

    1: https://sourcegraph.com/search?q=context:global+std::experim...

    2: https://sourcegraph.com/search?q=context:global+HWY_NAMESPAC...

  • Permuting Bits with GF2P8AFFINEQB
    1 project | news.ycombinator.com | 27 Sep 2023
    Thanks for the link. We were previously using GFNI for bit reversal and 8-bit shifts, and I just extended that to our 8-bit BroadcastSignBit (https://github.com/google/highway/pull/1784).
  • Six times faster than C
    4 projects | news.ycombinator.com | 6 Jul 2023
    You could study Google's Highway library [1].

    [1] https://github.com/google/highway

  • AMD EPYC 97x4 β€œBergamo” CPUs: 128 Zen 4c CPU Cores for Servers, Shipping Now
    1 project | news.ycombinator.com | 24 Jun 2023
    Runtime feature detection need not be rare nor hard, it's a few dozen lines of boilerplate. You can even write your code just once: see https://github.com/google/highway#examples.

What are some alternatives?

When comparing StringZilla and highway you can also consider the following projects:

usearch - Fast Open-Source Search & Clustering engine Γ— for Vectors & πŸ”œ Strings Γ— in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram πŸ”

xsimd - C++ wrappers for SIMD intrinsics and parallelized, optimized mathematical functions (SSE, AVX, AVX512, NEON, SVE))

Simd - C++ image processing and machine learning library with using of SIMD: SSE, AVX, AVX-512, AMX for x86/x64, VMX(Altivec) and VSX(Power7) for PowerPC, NEON for ARM.

Vc - SIMD Vector Classes for C++

aho-corasick - A fast implementation of Aho-Corasick in Rust.

swup - Versatile and extensible page transition library for server-rendered websites πŸŽ‰

rust-memchr - Optimized string search routines for Rust.

DirectXMath - DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps

popular-baby-names - 1, 000 most popular names for baby boys and girls in CSV and JSON formats. Generator written in Python.

riscv-v-spec - Working draft of the proposed RISC-V V vector extension

rebar - A biased barometer for gauging the relative speed of some regex engines on a curated set of tasks.

jpeg-xl