The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning. Learn more →
Top 23 Avx2 Open-Source Projects
-
simdjson
Parsing gigabytes of JSON per second : used by Facebook/Meta Velox, the Node.js runtime, ClickHouse, WatermelonDB, Apache Doris, Milvus, StarRocks
Project mention: 1BRC Merykitty's Magic SWAR: 8 Lines of Code Explained in 3k Words | news.ycombinator.com | 2024-03-09 -
[0] for those interested in Highway.
It's also mentioned in [1], which starts off
> Today we're sharing open source code that can sort arrays of numbers about ten times as fast as the C++ std::sort, and outperforms state of the art architecture-specific algorithms, while being portable across all modern CPU architectures. Below we discuss how we achieved this.
[0] https://github.com/google/highway
[1] https://opensource.googleblog.com/2022/06/Vectorized%20and%2..., which has an associated paper at https://arxiv.org/pdf/2205.05982.pdf.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Project mention: Distil-Whisper: distilled version of Whisper that is 6 times faster, 49% smaller | news.ycombinator.com | 2023-10-31
Just a point of clarification - faster-whisper references it but ctranslate2[0] is what's really doing the magic here.
Ctranslate2 is a sleeper powerhouse project that enables a lot. They should be up front and center and get the credit they deserve.
-
I was curious about these libraries a few weeks ago and did some searching. Is there one that's got a clearly dominating set of users or contributors?
I don't know what a good way to compare these might be, other than perhaps activity/contributor count.
[1] https://github.com/simd-everywhere/simde
[2] https://github.com/ermig1979/Simd
[3] https://github.com/google/highway
-
StringZilla
Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖
Project mention: Measuring energy usage: regular code vs. SIMD code | news.ycombinator.com | 2024-02-19The 3.5x energy-efficiency gap between serial and SIMD code becomes even larger when
A. you do byte-level processing instead of float words;
B. you use embedded, IoT, and other low-energy devices.
A few years ago I've compared Nvidia Jetson Xavier (long before the Orin release), Intel-based MacBook Pro with Core i9, and AVX-512 capable CPUs on substring search benchmarks.
On Xavier one can quite easily disable/enable cores and reconfigure power usage. At peak I got to 4.2 GB/J which was an 8.3x improvement in inefficiency over LibC in substring search operations. The comparison table is still available in the older README: https://github.com/ashvardanian/StringZilla/tree/v2.0.2?tab=...
-
DirectXMath
DirectXMath is an all inline SIMD C++ linear algebra library for use in games and graphics apps
For those unfamiliar, like I was, DXM is DirectXMath.
-
CRoaring
Roaring bitmaps in C (and C++), with SIMD (AVX2, AVX-512 and NEON) optimizations: used by Apache Doris, ClickHouse, and StarRocks
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
-
-
simdutf
Unicode routines (UTF8, UTF16, UTF32): billions of characters per second using SSE2, AVX2, NEON, AVX-512, RISC-V Vector Extension. Part of Node.js and Bun.
Project mention: Vectorizing Unicode conversions on real RISC-V hardware | news.ycombinator.com | 2024-01-27The project was mostly inspired by simdutf [0] which has been around for a couple of years already, and I don't think iconv has any of its vectorized implementations for other architectures.
-
highwayhash
Native Go version of HighwayHash with optimized assembly implementations on Intel and ARM. Able to process over 10 GB/sec on a single core on Intel CPUs - https://en.wikipedia.org/wiki/HighwayHash (by minio)
Project mention: Can I concatenate multiple non-crypto hash functions to reduce collision? | /r/golang | 2023-05-16highwayhash (alt) provides 256 bits. Fast mainly for larger inputs and supports seeds.
-
C++ offers tools for writing better APIs, and since the addition of concepts in C++20 it offers much better API enforcement. Writing an equivalent to libraries such as {fmt} or EVE is not possible in anything we’d call C.
-
Project mention: SIMD based custom object and key-value pair sorting in C++ | news.ycombinator.com | 2024-02-14
-
Project mention: Show HN: Time Series Benchmark TurboPFor,TurboFloat,TurboFloat LzX,TurboGorilla | news.ycombinator.com | 2023-06-25
-
SimSIMD
Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐
-
-
How does this compare to fastbase64[0]? Great article, I'm happy to see this sort of thing online. I wish I could share the author's optimism about portable SIMD libraries.
-
Thorium-Win-AVX2
Repo to serve AVX2 Windows builds of Thorium. https://github.com/Alex313031/Thorium/
FYI a number of streaming sites won't work - while this has Widevine, it does not have Verified Media Path (VMP) which verifies that you're running a signed binary. https://github.com/Alex313031/Thorium-Win-AVX2/issues/84#iss...
https://github.com/castlabs/electron-releases is an interesting Electron fork with full Widevine+VMP support - but it's very much closed-source.
-
-
I think all of these techniques check whether the input string is correct. For example see here https://github.com/WojciechMula/toys/blob/master/lookup-in-s...
-
-
-
Project mention: Show HN: The fastest Turbo-Base64 now for Python | news.ycombinator.com | 2023-08-24
** Cython bindings for Turbo Base64 [1] **
- 20-30x faster than the standard library
- Benchmarks faster than any other C base64 library
- Fastest implementation of AVX, AVX2, and AVX512 base64 encoding
- No other dependencies
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Avx2 related posts
- Training great LLMs from ground zero in the wilderness as a startup
- Measuring energy usage: regular code vs. SIMD code
- Show HN: StringZilla v3 with C++, Rust, and Swift bindings, and AVX-512 and NEON
- How fast is rolling Karp-Rabin hashing?
- From slow to SIMD: A Go optimization story
- simdjson: Parsing Gigabytes of JSON per Second
- 4B If Statements
-
A note from our sponsor - WorkOS
workos.com | 28 Mar 2024
Index
What are some of the best open-source Avx2 projects? This list will help you:
Project | Stars | |
---|---|---|
1 | simdjson | 18,275 |
2 | highway | 3,559 |
3 | CTranslate2 | 2,667 |
4 | simde | 2,127 |
5 | StringZilla | 1,660 |
6 | DirectXMath | 1,477 |
7 | CRoaring | 1,425 |
8 | Vc | 1,405 |
9 | libsimdpp | 1,180 |
10 | simdutf | 910 |
11 | highwayhash | 852 |
12 | eve | 833 |
13 | x86-simd-sort | 790 |
14 | TurboPFor | 736 |
15 | SimSIMD | 671 |
16 | simdutf8 | 505 |
17 | fastbase64 | 416 |
18 | Thorium-Win-AVX2 | 347 |
19 | nsimd | 310 |
20 | toys | 308 |
21 | sse-popcount | 303 |
22 | TurboRLE | 275 |
23 | Turbo-Base64 | 248 |