Show HN: Fast Vector Similarity Using Rust and Python

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

fast_vector_similarity

7 324 7.2 Rust

The Fast Vector Similarity Library is designed to provide efficient computation of various similarity measures between vectors.

Yeah, like the other commenter said, everything is in this file here:
https://github.com/Dicklesworthstone/fast_vector_similarity/...
If you also make your project using Rust and Maturin, you can literally just copy and paste that into your project because it's totally generic, and if the repo is public, GitHub will just run it all for you for free.
The only thing is you need to create an account on PyPi (pip) and add 2-Factor Auth so you can generate an API key. Then you go into the repo settings and go to secrets, and create a Github Actions secret with the name PYPI_API_TOKEN and make the value your PyPi token. That's it! It will not only compile all the wheels for you but even upload the project to PyPi for you using the settings found in your pyproject.toml file, like this:
https://github.com/Dicklesworthstone/fast_vector_similarity/...

np-sims

2 12 8.6 Python

numpy ufuncs for vector similarity

Nice!
I recently implemented a C-based numpy solution of LSH to compress / recover cosine similarity[1]. It was my first time writing Numpy C, and it was a lot of fun to massively improve the performance over pure Python[2].
1- https://github.com/softwaredoug/np-sims
2- https://softwaredoug.com/blog/2023/08/22/rand-projections-in...

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
simsimd

1 - -

It’s a good start, but you can’t generally get even remotely close to hardware potential in Rust, let alone Python.
I had to implement a separate C99 library to always trigger the newest SIMD intrinsics, occasionally leveraging SVE on more recent ARM CPUs, that compilers don’t know how to generate.
That library is in turn used in USearch, which is designed for Approximate Search, but some users recently reported that they use it for brute force as well… where it performed 20x faster than FAISS.
https://github.com/unum-cloud/simsimd

DoctorGPT

7 213 7.2 Python

💻📚💡 DoctorGPT provides advanced LLM prompting for PDFs and webpages. (by FeatureBaseDB)

If anyone is interested in how to use something besides OpenAI for embeddings (the ada-002 model) consider checking out Instructor Large: https://huggingface.co/hkunlp/instructor-large
There is some reference code in the DoctorGPT project that uses this approach: https://github.com/FeatureBaseDB/DoctorGPT. This project is designed to image and then run OCR on PDFs (because not all PDFs have embedded text) and uses FeatureBase as the storage/vector engine for prompt tuning and assembly.

swiss_army_llama

11 867 8.8 Python

A FastAPI service for semantic text search using precomputed embeddings and advanced similarity measures, with built-in support for various file types through textract.

Cool, I also made a similar kind of tool recently that I also shared on HN a couple weeks ago. You might find it useful for generating and managing LLM embeddings locally:
https://github.com/Dicklesworthstone/llama_embeddings_fastap...

llama_embeddings_fastap

2 - -

Cool, I also made a similar kind of tool recently that I also shared on HN a couple weeks ago. You might find it useful for generating and managing LLM embeddings locally:
https://github.com/Dicklesworthstone/llama_embeddings_fastap...

qdrant

141 17,943 9.9 Rust

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Awesome work!
At Qdrant we do this at scale. Store billions of vectors in a cluster of any size. Also in Rust which turned out to be an amazing choice, and fully open source. It uses various features to keep things performant, such as vectorization (multiple arches), quantization (form of compression) and more.
https://github.com/qdrant/qdrant

SaaSHub

www.saashub.com featured

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Boost Your Code's Efficiency: Introducing Semantic Cache with Qdrant

2 projects | dev.to | 25 Apr 2024
Qdrant 1.8.0 - Major Performance Enhancements

2 projects | dev.to | 8 Mar 2024
Perform Image-Driven Reverse Image Search on E-Commerce Sites with ImageBind and Qdrant

3 projects | dev.to | 28 Feb 2024
Step-by-Step Guide to Building LLM Applications with Ruby (Using Langchain and Qdrant)

2 projects | dev.to | 31 Jan 2024
Qdrant - Using FastEmbed for Rapid Embedding Generation: A Benchmark and Guide

1 project | dev.to | 17 Jan 2024

Show HN: Fast Vector Similarity Using Rust and Python

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
neural-network Matching search-engine knn-algorithm Hnsw
Post date: 23 Aug 2023

fast_vector_similarity

np-sims

InfluxDB

simsimd

DoctorGPT

swiss_army_llama

llama_embeddings_fastap

qdrant

SaaSHub

Related posts

Boost Your Code's Efficiency: Introducing Semantic Cache with Qdrant

Qdrant 1.8.0 - Major Performance Enhancements

Perform Image-Driven Reverse Image Search on E-Commerce Sites with ImageBind and Qdrant

Step-by-Step Guide to Building LLM Applications with Ruby (Using Langchain and Qdrant)

Qdrant - Using FastEmbed for Rapid Embedding Generation: A Benchmark and Guide

Show HN: Fast Vector Similarity Using Rust and Python

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com neural-network Matching search-engine knn-algorithm Hnsw Post date: 23 Aug 2023

Related posts

Boost Your Code's Efficiency: Introducing Semantic Cache with Qdrant

Qdrant 1.8.0 - Major Performance Enhancements

Perform Image-Driven Reverse Image Search on E-Commerce Sites with ImageBind and Qdrant

Step-by-Step Guide to Building LLM Applications with Ruby (Using Langchain and Qdrant)

Qdrant - Using FastEmbed for Rapid Embedding Generation: A Benchmark and Guide

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
neural-network Matching search-engine knn-algorithm Hnsw
Post date: 23 Aug 2023