Storing OpenAI embeddings in Postgres with pgvector

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

postgres

15 1,269 9.5 Shell

Unmodified Postgres with some useful plugins (by supabase)

Hey HN, this one has a cool back story with it, that really shows the power of open source.
The author, Greg[0], wanted to use pgvector in a Postgres services, so he created a PR[1] in our Postgres repo. He then reached out and we decided it would be fun to collaborate on a project together, so he helped us build a "ChatGPT" interface for the supabase docs (which we will release tomorrow).
This article explains all the steps you'd take to implement the same functionality yourself.
I want to give a shout-out to pgvector too, it's a great extension [2]
[0] Greg: https://twitter.com/ggrdson
[1] pgvector PR: https://github.com/supabase/postgres/pull/472
[2] pgvector: https://github.com/pgvector/pgvector

pgvector

78 9,211 9.9 C

Open-source vector similarity search for Postgres

Hey HN, this one has a cool back story with it, that really shows the power of open source.
The author, Greg[0], wanted to use pgvector in a Postgres services, so he created a PR[1] in our Postgres repo. He then reached out and we decided it would be fun to collaborate on a project together, so he helped us build a "ChatGPT" interface for the supabase docs (which we will release tomorrow).
This article explains all the steps you'd take to implement the same functionality yourself.
I want to give a shout-out to pgvector too, it's a great extension [2]
[0] Greg: https://twitter.com/ggrdson
[1] pgvector PR: https://github.com/supabase/postgres/pull/472
[2] pgvector: https://github.com/pgvector/pgvector

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
Milvus

104 26,857 10.0 Go

A cloud-native vector database, storage for next generation AI applications

First time I've heard of pgvector - for folks with experience, how does it compare to other ANN plugins (i.e. Redis https://redis.io/docs/stack/search/reference/vectors/) and purpose-built vector databases (i.e. Milvus https://milvus.io)?
Curious about both performance/QPS and scale/# of vectors.

txtai

355 6,990 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

You might want to check out txtai (https://github.com/neuml/txtai). It's default configuration is a FAISS index paired with a SQLite database for filtering.
Also worth mentioning that there are plenty of other vector models to try outside of OpenAI. Many open-source and much smaller than 1536 dimensions. Check out the Hugging Face Hub (https://hf.co/models). For example this model (https://huggingface.co/sentence-transformers/all-MiniLM-L6-v...) works great in many cases and is only 384 dimensions. Runs great locally and is FOSS.

Typesense

129 17,965 9.8 C++

Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

Disclaimer: I work on Typesense [1] (an open source alternative to Algolia + Pinecone) and we recently added Vector Search as a feature to Typesense [2].
Postgres can do a lot of things, but for large enough datasets and/or when you want to add filtering into the mix along with vector search, then it becomes slow. And at that point you want to use a dedicated vector search database.
It's similar to how Postgres can also do full text search, but for large datasets and/or you want to add typo tolerance, faceting, grouping, filtering, synonyms, etc - the usual features you'd need when implementing a search feature - then it becomes slow to do this in pg and you'd then use a dedicated search engine.
In Typesense, we've now combined Vector Search along with filtering based on attributes in your documents, so you get the best of both worlds [2].
[1] https://typesense.org/

faiss

71 28,202 9.4 C++

A library for efficient similarity search and clustering of dense vectors.

One downside of pgvector is that it currently only supports one type of index (ivfflat), while others (FAISS, Milvus, qdrant, etc.) support other types of indices that can be advantageous depending on your workload (properties of vectors, size of dataset). See [1] for some more background.
[1] https://github.com/facebookresearch/faiss/wiki/Guidelines-to...

hnswlib

12 4,015 6.2 C++

Header-only C++/python library for fast approximate nearest neighbors

https://github.com/nmslib/hnswlib
Used it to index 40M text snippets in the legal domain. Allows incremental adding.
I love how it just works. You know, doesn’t ANNOY me or makes a FAISS. ;-)

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project