Are we at peak vector database?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • pgvector

    Open-source vector similarity search for Postgres

  • It’s about to get a lot better too. Pgvector now supports multi-threaded build

    https://github.com/pgvector/pgvector/issues/409#issuecomment...

  • txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  • I'll add txtai (https://github.com/neuml/txtai) to the list.

    There is still plenty of room for innovation in this space. Just need to focus on the right projects that are innovating and not the ones (re)working on problems solved in 2020/2021.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • auth

    Discontinued Fully open source, End to End Encrypted alternative to Google Photos and Apple Photos [Moved to: https://github.com/ente-io/ente]

  • Running machine learning on device.

    Context: I'm working on an e2ee alternative to Google Photos[1] where we have to cluster embeddings (for face recognition) and run similarity searches (for semantic search[2]) on device.

    [1]: https://ente.io

    [2]: https://openai.com/research/clip

  • marqo

    Unified embedding generation and search engine. Also available on cloud - cloud.marqo.ai

  • We (Marqo) are doing a lot on 1 and 2. There is a huge amount to be done on the ML side of vector search and we are investing heavily in it. I think it has not quite sunk in that vector search systems are ML systems and everything that comes with that. I would love to chat about 1 and 2 so feel free to email me (email is in my profile). What we have done so far is here -> https://github.com/marqo-ai/marqo

  • searcharray

    Pandas lexical matching (ie BM25) extension array

  • You might be interested in

    https://github.com/softwaredoug/searcharray

  • lantern

    PostgreSQL vector database extension for building AI applications

  • Traditional DBs already kinda support vector DBs via pg_vector extensions and such.

    There is a YC startup, latnern, that also built their own extension for postgres that is open source and is better for vector DB use cases: https://github.com/lanterndata/lantern

    But yeah! Traditional DBs already support this, if you consider this extension to be part of Postgres.

  • sliders

    Concept Sliders for Precise Control of Diffusion Models

  • > Always felt they're more like hashes/fingerprints for the RAG use cases.

    Yes, I see where you’re coming from. Perceptual hashes[0] are pretty similar, the key is that similar documents should have similar embedding (unlike cryptographic hashes, where a single bit flip should produce a completely different hash).

    Nice embeddings encode information spatially, a classic example of embedding arithmetic is: king - man + woman = queen[1]. “Concept Sliders” is a cool application of this to image generation [2].

    Personally I’ve not had _too_ much trouble with running out of RAM due to embeddings themselves, but I did spend a fair amount of time last week profiling memory usage to make sure I didn’t run out in prod, so it is on my mind!

    [0] https://en.m.wikipedia.org/wiki/Perceptual_hashing

    [1] https://www.technologyreview.com/2015/09/17/166211/king-man-...

    [2] https://github.com/rohitgandikota/sliders

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts