Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more β
Top 19 neural-search Open-Source Projects
-
qdrant
Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
PaddleNLP
π Easy-to-use and powerful NLP and LLM library with π€ Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including πText Classification, π Neural Search, β Question Answering, βΉοΈ Information Extraction, π Document Intelligence, π Sentiment Analysis etc.
-
Weaviate
Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native databaseβ.
-
txtai
π‘ All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
refinery
The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
-
primeqa
The prime repository for state-of-the-art Multilingual Question Answering research and development.
-
elastiknn
Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Ask HN: Has Anyone Trained a personal LLM using their personal notes? | news.ycombinator.com | 2024-04-03I'm currently looking to implement locally, using QDrant [1] for instance.
I'm just playing around, but it makes sense to have a runnable example for our users at work too :) [2].
[1]. https://qdrant.tech/
Project mention: Search for anything ==> Immich fails to download textual.onnx | /r/immich | 2023-09-15
Project mention: pgvecto.rs alternatives - qdrant and Weaviate | libhunt.com/r/pgvecto.rs | 2024-03-13
txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.
Project mention: DocArray β Represent, send, and store multimodal data for ML | news.ycombinator.com | 2023-04-27
RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:
- Chunking can interfer with context boundaries
- Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)
- Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)
- RAG will miserably fail with requests like "summarize the whole document"
- to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb
1 https://github.com/underlines/awesome-marketing-datascience/...
If you are interested, you can check out the documentation here: https://github.com/raphaelsty/cherche
Project mention: [P] Introducing Neural-Cherche: Enhance Document Retrieval with Advanced AI Models | /r/MachineLearning | 2023-11-19I'm excited to share a tool I've developed called Neural-Cherche. Its main purpose is to transform a Sentence Transformer into a ColBERT model, which is currently at the forefront of information retrieval tools.
As mentioned previously, all of the main components of txtai can be replaced with custom components. For example, there are external integrations for storing dense vectors in Weaviate and Qdrant to name a few.
neural-search related posts
- Jina.ai: Self-host Multimodal models
- [P] Introducing Neural-Cherche: Enhance Document Retrieval with Advanced AI Models
- FLaNK Stack Weekly for 30 Oct 2023
- External database integration
- Langchain Is Pointless
- [P] Semantic search
- Minimalist semantic search with Cherche 2.0
-
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024
Index
What are some of the best open-source neural-search projects? This list will help you:
Project | Stars | |
---|---|---|
1 | jina | 20,009 |
2 | qdrant | 17,839 |
3 | clip-as-service | 12,181 |
4 | PaddleNLP | 11,386 |
5 | Weaviate | 9,436 |
6 | txtai | 6,953 |
7 | dalle-flow | 2,824 |
8 | docarray | 2,739 |
9 | finetuner | 1,423 |
10 | mteb | 1,372 |
11 | refinery | 1,360 |
12 | primeqa | 698 |
13 | vectordb | 462 |
14 | elastiknn | 352 |
15 | cherche | 311 |
16 | neural-cherche | 295 |
17 | react-search | 24 |
18 | weaviate-txtai | 7 |
19 | AquilaHub | 2 |
Sponsored