Python, C, Assembly – Faster Cosine Similarity

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

SimSIMD

15 715 9.6 C

Up to 200x Faster Inner Products and Vector Similarity — for Python, JavaScript, Rust, and C, supporting f64, f32, f16 real & complex, i8, and binary vectors using SIMD for both x86 AVX2 & AVX-512 and Arm NEON & SVE 📐

Kahan floats are also commonly used in such cases, but I believe there is room for improvement without hitting those extremes. First of all, we should tune the epsilon here: https://github.com/ashvardanian/SimSIMD/blob/f8ff727dcddcd14...
As for the 64-bit version, its harder, as the higher-precision `rsqrt` approximations are only available with "AVX512ER". I'm not sure which CPUs support that, but its not available on Sapphire Rapids.

usearch

20 1,629 9.8 C++

Fast Open-Source Search & Clustering engine × for Vectors & 🔜 Strings × in C++, C, Python, JavaScript, Rust, Java, Objective-C, Swift, C#, GoLang, and Wolfram 🔍

The hardest (still missing) part of efficient cosine computation distance computation is picking a good epsilon for the `sqrt` calculation and avoiding "division by zero" problems.
We have an open issue about it in USearch and a related one in SimSIMD itself, so if you have any suggestions, please share your insights - they would impact millions of devices using the library (directly on servers and mobile, and through projects like ClickHouse and some of the Google repos): https://github.com/unum-cloud/usearch/issues/320

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
StringZilla

14 1,791 9.8 C++

Up to 10x faster strings for C, C++, Python, Rust, and Swift, leveraging SWAR and SIMD on Arm Neon and x86 AVX2 & AVX-512-capable chips to accelerate search, sort, edit distances, alignment scores, etc 🦖

That matches my experience, and goes beyond GCC and Clang. Between 2018 and 2020 I was giving a lot of lectures on this topic and we did a bunch of case studies with Intel on their older ICC and what later became the OneAPI.
Short story, unless you are doing trivial data-parallel operations, like in SimSIMD, compilers are practically useless. As a proof, I wrote what is now the StringZilla library (https://github.com/ashvardanian/stringzilla) and we've spent weeks with an Intel team, tuning the compiler, no result. So if you are processing a lot of strings, or variable-length coded data, like compression/decompression, hand-written SIMD kernels are pretty much unbeatable.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project