New and Improved Embedding Model for OpenAI

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

Milvus

105 26,979 10.0 Go

A cloud-native vector database, storage for next generation AI applications

Solid work OpenAI, though I'd definitely like to see some more benchmarks on a wider variety of datasets in addition to the ones listed in the post. Regardless, it's good to see embeddings becoming more and more mainstream and easier to leverage out-of-box. We tried image embeddings many moons ago (2015) with AlexNet trained across a custom dataset, but we still had to add quite a few custom roles post-inference.
A large selling point for ada-002 embeddings seems to be the reduced dimensionality. While lower-dimensional embeddings definitely help performance, I would say it's still highly dependent on the index that's being used. Graph- and tree-based indexes will benefit less than ones based on IVF (https://zilliz.com/blog/vector-index), as they do fewer overall distance computations during query time, but the speedup would still be significant.
Still been meaning to try semantic search across Wikipedia via text embeddings. Will definitely play around with OpenAI + Milvus (https://github.com/milvus-io/milvus).

qdrant

141 17,943 9.9 Rust

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

ES and OS are desperately slow because based on the lucene vector search index. A dedicated vector database like Qdrant will be always a better choice https://github.com/qdrant/qdrant

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
vector-db-benchmark

6 227 9.1 Python

Framework for benchmarking vector search engines

Do we have any idea why lucene vector search underperforms? As of lucene 9.1 (and elastic 8.4), it runs the same sort of filtered/categorical HNSW that qdrant runs (https://lucene.apache.org/core/9_1_0/core/org/apache/lucene/...). Qdrant's benchmarking code (https://github.com/qdrant/vector-db-benchmark/blob/9263ba/en...) does use the new filtered ann query with elastic 8.4, so it appears to be a fair benchmark. Why is lucene/elastic so much slower? Is it a rust vs. java thing? Or some memory management issues?

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Danswer – open-source question answering across all your docs

7 projects | news.ycombinator.com | 10 Jul 2023
I've changed my mind about Code Interpretor

3 projects | /r/ChatGPT | 9 Jul 2023
A Critical Field Guide for Working with Machine Learning Datasets

3 projects | news.ycombinator.com | 17 Feb 2023
7 Vector Databases Every Developer Should Know!

4 projects | dev.to | 8 Feb 2024
FANN: Vector Search in 200 Lines of Rust

8 projects | news.ycombinator.com | 15 Jun 2023

New and Improved Embedding Model for OpenAI

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
nearest-neighbor-search approximate-nearest-neighbor-search Hnsw similarity-search vector-search
Post date: 15 Dec 2022

Milvus

qdrant

InfluxDB

vector-db-benchmark

Related posts

Show HN: Danswer – open-source question answering across all your docs

I've changed my mind about Code Interpretor

A Critical Field Guide for Working with Machine Learning Datasets

7 Vector Databases Every Developer Should Know!

FANN: Vector Search in 200 Lines of Rust

New and Improved Embedding Model for OpenAI

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com nearest-neighbor-search approximate-nearest-neighbor-search Hnsw similarity-search vector-search Post date: 15 Dec 2022

Milvus

qdrant

InfluxDB

vector-db-benchmark

Related posts

Show HN: Danswer – open-source question answering across all your docs

I've changed my mind about Code Interpretor

A Critical Field Guide for Working with Machine Learning Datasets

7 Vector Databases Every Developer Should Know!

FANN: Vector Search in 200 Lines of Rust

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
nearest-neighbor-search approximate-nearest-neighbor-search Hnsw similarity-search vector-search
Post date: 15 Dec 2022