Are we at peak vector database?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

pgvector

78 9,211 9.9 C

Open-source vector similarity search for Postgres

It’s about to get a lot better too. Pgvector now supports multi-threaded build
https://github.com/pgvector/pgvector/issues/409#issuecomment...

txtai

355 6,990 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

I'll add txtai (https://github.com/neuml/txtai) to the list.
There is still plenty of room for innovation in this space. Just need to focus on the right projects that are innovating and not the ones (re)working on problems solved in 2020/2021.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
auth

49 5,439 9.6 TypeScript

Discontinued Fully open source, End to End Encrypted alternative to Google Photos and Apple Photos [Moved to: https://github.com/ente-io/ente]

Running machine learning on device.
Context: I'm working on an e2ee alternative to Google Photos[1] where we have to cluster embeddings (for face recognition) and run similarity searches (for semantic search[2]) on device.
[1]: https://ente.io
[2]: https://openai.com/research/clip

marqo

114 4,111 9.3 Python

Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

We (Marqo) are doing a lot on 1 and 2. There is a huge amount to be done on the ML side of vector search and we are investing heavily in it. I think it has not quite sunk in that vector search systems are ML systems and everything that comes with that. I would love to chat about 1 and 2 so feel free to email me (email is in my profile). What we have done so far is here -> https://github.com/marqo-ai/marqo

searcharray

4 151 9.7 Python

Pandas lexical matching (ie BM25) extension array

You might be interested in
https://github.com/softwaredoug/searcharray

lantern

5 646 9.6 C

PostgreSQL vector database extension for building AI applications

Traditional DBs already kinda support vector DBs via pg_vector extensions and such.
There is a YC startup, latnern, that also built their own extension for postgres that is open source and is better for vector DB use cases: https://github.com/lanterndata/lantern
But yeah! Traditional DBs already support this, if you consider this extension to be part of Postgres.

sliders

3 718 8.5 Jupyter Notebook

Concept Sliders for Precise Control of Diffusion Models

> Always felt they're more like hashes/fingerprints for the RAG use cases.
Yes, I see where you’re coming from. Perceptual hashes[0] are pretty similar, the key is that similar documents should have similar embedding (unlike cryptographic hashes, where a single bit flip should produce a completely different hash).
Nice embeddings encode information spatially, a classic example of embedding arithmetic is: king - man + woman = queen[1]. “Concept Sliders” is a cool application of this to image generation [2].
Personally I’ve not had _too_ much trouble with running out of RAM due to embeddings themselves, but I did spend a fair amount of time last week profiling memory usage to make sure I didn’t run out in prod, so it is on my mind!
[0] https://en.m.wikipedia.org/wiki/Perceptual_hashing
[1] https://www.technologyreview.com/2015/09/17/166211/king-man-...
[2] https://github.com/rohitgandikota/sliders

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project