Our great sponsors
-
Weaviate
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.
Checkout the open source vector search engine Weaviate: https://github.com/semi-technologies/weaviate
It’s not a relational db, but it supports Graph-like connections between objects, which makes it really easy to model your relations.
-
qdrant
Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
We are developing open-source vector search technology. https://github.com/qdrant/qdrant It is a neural search engine with extended filtering support that implements a custom modification of the HNSW algorithm for Approximate Nearest Neighbour search.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
semantic-search-through-wikipedia-with-weaviate
Discontinued Semantic search through a vectorized Wikipedia (SentenceBERT) with the Weaviate vector search engine
* Wikipedia demo dataset: https://github.com/semi-technologies/semantic-search-through...
-
* Wikipedia demo dataset: https://github.com/semi-technologies/semantic-search-through...
-
biggraph-wikidata-search-with-weaviate
Search through Facebook Research's PyTorch BigGraph Wikidata-dataset with the Weaviate vector search engine
* Wikidata dataset: https://github.com/semi-technologies/biggraph-wikidata-searc...
Last week there was also a feature on Techcrunch about vector search and Weaviate:
-
* Wikidata dataset: https://github.com/semi-technologies/biggraph-wikidata-searc...
Last week there was also a feature on Techcrunch about vector search and Weaviate:
-
Check out pgvector: https://github.com/ankane/pgvector (disclosure: am author)
It uses IVFFlat indexing, but could be extended to support product quantization / ScaNN.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Not the author, but at work we've had in the hundreds of millions. Faiss can certainly scale.
If you do have a tiny index and want to try Google's version of vector search (as an alternative to Faiss), you can easily run ScaNN locally [1] (linked in the article, that's the underlying tech). On small scale I had better perf with ScaNN
[1] https://github.com/google-research/google-research/tree/mast...
-
CLIP
CLIP (Contrastive Language-Image Pretraining), Predict the most relevant text snippet given an image
I agree having a good vector is important to start with. However this is not very hard to make it work, you only need to finetune some of the clip models[1] to run it well.
Disclose: I have built a vector search engine to proof this idea[2]
-
If anyone is interested, I maintain a list of open source vector search engine services[1].
Feel free to submit a new issues or merge request if you wish for new library added
-
sample-apps
Repository of sample applications for https://vespa.ai, the open big data serving engine
>
Vespa.ai supports combining dense vector search with keyword search and ranking, see https://docs.google.com/presentation/d/1vWKhSvFH-4MFcs4aNa9C...
There is also a Vespa sample application (open source, Apache 2) demonstrating multiple different retrieval and ranking strategies over at https://github.com/vespa-engine/sample-apps/blob/master/msma...
-