Storing OpenAI embeddings in Postgres with pgvector

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • postgres

    Unmodified Postgres with some useful plugins (by supabase)

  • Hey HN, this one has a cool back story with it, that really shows the power of open source.

    The author, Greg[0], wanted to use pgvector in a Postgres services, so he created a PR[1] in our Postgres repo. He then reached out and we decided it would be fun to collaborate on a project together, so he helped us build a "ChatGPT" interface for the supabase docs (which we will release tomorrow).

    This article explains all the steps you'd take to implement the same functionality yourself.

    I want to give a shout-out to pgvector too, it's a great extension [2]

    [0] Greg: https://twitter.com/ggrdson

    [1] pgvector PR: https://github.com/supabase/postgres/pull/472

    [2] pgvector: https://github.com/pgvector/pgvector

  • pgvector

    Open-source vector similarity search for Postgres

  • Hey HN, this one has a cool back story with it, that really shows the power of open source.

    The author, Greg[0], wanted to use pgvector in a Postgres services, so he created a PR[1] in our Postgres repo. He then reached out and we decided it would be fun to collaborate on a project together, so he helped us build a "ChatGPT" interface for the supabase docs (which we will release tomorrow).

    This article explains all the steps you'd take to implement the same functionality yourself.

    I want to give a shout-out to pgvector too, it's a great extension [2]

    [0] Greg: https://twitter.com/ggrdson

    [1] pgvector PR: https://github.com/supabase/postgres/pull/472

    [2] pgvector: https://github.com/pgvector/pgvector

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Milvus

    A cloud-native vector database, storage for next generation AI applications

  • First time I've heard of pgvector - for folks with experience, how does it compare to other ANN plugins (i.e. Redis https://redis.io/docs/stack/search/reference/vectors/) and purpose-built vector databases (i.e. Milvus https://milvus.io)?

    Curious about both performance/QPS and scale/# of vectors.

  • txtai

    đź’ˇ All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  • You might want to check out txtai (https://github.com/neuml/txtai). It's default configuration is a FAISS index paired with a SQLite database for filtering.

    Also worth mentioning that there are plenty of other vector models to try outside of OpenAI. Many open-source and much smaller than 1536 dimensions. Check out the Hugging Face Hub (https://hf.co/models). For example this model (https://huggingface.co/sentence-transformers/all-MiniLM-L6-v...) works great in many cases and is only 384 dimensions. Runs great locally and is FOSS.

  • Typesense

    Open Source alternative to Algolia + Pinecone and an Easier-to-Use alternative to ElasticSearch ⚡ 🔍 ✨ Fast, typo tolerant, in-memory fuzzy Search Engine for building delightful search experiences

  • Disclaimer: I work on Typesense [1] (an open source alternative to Algolia + Pinecone) and we recently added Vector Search as a feature to Typesense [2].

    Postgres can do a lot of things, but for large enough datasets and/or when you want to add filtering into the mix along with vector search, then it becomes slow. And at that point you want to use a dedicated vector search database.

    It's similar to how Postgres can also do full text search, but for large datasets and/or you want to add typo tolerance, faceting, grouping, filtering, synonyms, etc - the usual features you'd need when implementing a search feature - then it becomes slow to do this in pg and you'd then use a dedicated search engine.

    In Typesense, we've now combined Vector Search along with filtering based on attributes in your documents, so you get the best of both worlds [2].

    [1] https://typesense.org/

  • faiss

    A library for efficient similarity search and clustering of dense vectors.

  • One downside of pgvector is that it currently only supports one type of index (ivfflat), while others (FAISS, Milvus, qdrant, etc.) support other types of indices that can be advantageous depending on your workload (properties of vectors, size of dataset). See [1] for some more background.

    [1] https://github.com/facebookresearch/faiss/wiki/Guidelines-to...

  • hnswlib

    Header-only C++/python library for fast approximate nearest neighbors

  • https://github.com/nmslib/hnswlib

    Used it to index 40M text snippets in the legal domain. Allows incremental adding.

    I love how it just works. You know, doesn’t ANNOY me or makes a FAISS. ;-)

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts