Our great sponsors
-
Milvus is completely open source (https://github.com/milvus-io/milvus) and supports a variety of index types (https://milvus.io/docs/overview.md#Index-types) and support various consistency levels, scalar/metadata filtering, and time travel. We started working on Milvus back in 2018, with 2.0 being released in January 2022 (https://github.com/milvus-io/milvus/releases/tag/v2.0.0).
For those interested, here's a comparison with other open source vector databases: https://zilliz.com/comparison. For those who don't want to be burdened with installing and maintaining a local database, there's a managed service available as well: https://zilliz.com/cloud.
-
towhee
Towhee is a framework that is dedicated to making neural data processing pipelines simple and fast.
As another commenter noted, Milvus is overkill and a "bit much" if you're learning/playing.
A good intro to the field with progression towards a full Milvus implementation could be starting with towhee[0] (which is also supported by Milvus).
towhee has an example to do exactly what you want with CLIP[1].
[0] - https://towhee.io/
[1] - https://github.com/towhee-io/examples/tree/main/image/text_i...
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
https://github.com/currentslab/awesome-vector-search
I was surprised to see Elastic actually has ok support for some of this stuff, though it appears slower for most of the tasks.
-
typesense-instantsearch-semantic-search-demo
A demo that shows how to build a semantic search experience with Typesense's vector search feature and Instantsearch.js
We added HNSW-based vector search to Typesense as well recently: https://typesense.org/docs/0.24.0/api/vector-search.html
So you can combine attribute-based filters along with nearest-neighbor search.
Put together this semantic search + filtering demo just last week: https://github.com/typesense/typesense-instantsearch-semanti...
-
We added HNSW-based vector search to Typesense as well recently: https://typesense.org/docs/0.24.0/api/vector-search.html
So you can combine attribute-based filters along with nearest-neighbor search.
Put together this semantic search + filtering demo just last week: https://github.com/typesense/typesense-instantsearch-semanti...
-
-
I really don't want another database. I just want to have a solution built in for Postgres, and more specific RDS which we use. I know there will be some extra difficulty that I will have to manage (e.g. reindexing to a new model that is outputting different embeddings), but I really don't want another piece of infrastructure.
If anyone from AWS/Google/Azure is listening, please add pgvector [1] into your managed Postgres offerings!
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
That is great! I'll keep an eye.
I've been playing with this extension: https://github.com/asg017/sqlite-vss
-
qdrant
Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
tbh. Looks like a huge overengineered legacy project. What is the clue to having all these ANN indexes in place? Is it a kinda art collection? What is the sense when you can just have HNSW in memory, with quantization, or on disk, GPU accelerated, etc. There are already better alternatives like Qdrant, which is written in Rust and super performant https://github.com/qdrant/qdrant, or Weaviate with GraphQL interface https://github.com/weaviate/weaviate
-
Weaviate
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
tbh. Looks like a huge overengineered legacy project. What is the clue to having all these ANN indexes in place? Is it a kinda art collection? What is the sense when you can just have HNSW in memory, with quantization, or on disk, GPU accelerated, etc. There are already better alternatives like Qdrant, which is written in Rust and super performant https://github.com/qdrant/qdrant, or Weaviate with GraphQL interface https://github.com/weaviate/weaviate
-
Didn't even realise Milvus was so lacking. https://github.com/marqo-ai/marqo also has a hybrid approach. It's just a more complete/end-to-end platform than pinecone, so it really just depends on what you're building
-
autofaiss
Automatically create Faiss knn indices with the most optimal similarity search parameters.
Don't start with mullivus if you're learning. Too much yak shaving. Try https://github.com/criteo/autofaiss.
Also, TBH, it is a lot cheaper to run a simple faiss index.
-
If ES doesn't work for you, I recommend Vespa. https://github.com/vespa-engine/vespa
Others have made other suggestions, but Vespa has two unique features. First it is battle tested at a large scale, second it supports combining the keyword and vector scores in several ways. The latter is something that other hybrid systems don't do very well in my experience including ES/Solr.
-
examples
Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc. (by towhee-io)
As another commenter noted, Milvus is overkill and a "bit much" if you're learning/playing.
A good intro to the field with progression towards a full Milvus implementation could be starting with towhee[0] (which is also supported by Milvus).
towhee has an example to do exactly what you want with CLIP[1].
[0] - https://towhee.io/
[1] - https://github.com/towhee-io/examples/tree/main/image/text_i...
-
txtai
💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
txtai combines SQLite and Faiss to enable vector search. It also does a lot more than that.
-
Don't start with Milvus clustered version, not unless you have like 100million vectors.
Try Milvus standalone instead, much simpler. I also just found their python version (https://github.com/milvus-io/embd-milvus), which is quite neat.
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives