Top 19 neural-search Open-Source Projects

jina

126 20,009 9.2 Python

☁️ Build multimodal AI applications with cloud-native stack

Project mention: Jina.ai: Self-host Multimodal models | news.ycombinator.com | 2024-01-26

qdrant

139 17,839 9.9 Rust

Qdrant - High-performance, massive-scale Vector Database for the next generation of AI. Also available in the cloud https://cloud.qdrant.io/

Project mention: Ask HN: Has Anyone Trained a personal LLM using their personal notes? | news.ycombinator.com | 2024-04-03

I'm currently looking to implement locally, using QDrant [1] for instance.
I'm just playing around, but it makes sense to have a runnable example for our users at work too :) [2].
[1]. https://qdrant.tech/

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
clip-as-service

15 12,181 5.2 Python

🏄 Scalable embedding, reasoning, ranking for images and sentences with CLIP

Project mention: Search for anything ==> Immich fails to download textual.onnx | /r/immich | 2023-09-15

PaddleNLP

2 11,386 9.8 Python

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
Weaviate

76 9,436 10.0 Go

Weaviate is an open-source vector database that stores both objects and vectors, allowing for the combination of vector search with structured filtering with the fault tolerance and scalability of a cloud-native database.

Project mention: pgvecto.rs alternatives - qdrant and Weaviate | libhunt.com/r/pgvecto.rs | 2024-03-13

txtai

354 6,953 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Project mention: Build knowledge graphs with LLM-driven entity extraction | dev.to | 2024-02-21

txtai is an all-in-one embeddings database for semantic search, LLM orchestration and language model workflows.

dalle-flow

31 2,824 2.3 Python

🌊 A Human-in-the-Loop workflow for creating HD images from text
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
docarray

32 2,739 9.2 Python

Represent, send, store and search multimodal data

Project mention: DocArray – Represent, send, and store multimodal data for ML | news.ycombinator.com | 2023-04-27

finetuner

36 1,423 5.5 Python

:dart: Task-oriented embedding tuning for BERT, CLIP, etc.
mteb

2 1,372 9.1 Python

MTEB: Massive Text Embedding Benchmark

Project mention: AI for AWS Documentation | news.ycombinator.com | 2023-07-06

RAG is very difficult to do right. I am experimenting with various RAG projects from [1]. The main problems are:
- Chunking can interfer with context boundaries
- Content vectors can differ vastly from question vectors, for this you have to use hypothetical embeddings (they generate artificial questions and store them)
- Instead of saving just one embedding per text-chuck you should store various (text chunk, hypothetical embedding questions, meta data)
- RAG will miserably fail with requests like "summarize the whole document"
- to my knowledge, openAI embeddings aren't performing well, use a embedding that is optimized for question answering or information retrieval and supports multi language. Also look into instructor embeddings: https://github.com/embeddings-benchmark/mteb
1 https://github.com/underlines/awesome-marketing-datascience/...

refinery

20 1,360 4.6 Python

The data scientist's open-source choice to scale, assess and maintain natural language data. Treat training data like a software artifact.
primeqa

5 698 8.8 Python

The prime repository for state-of-the-art Multilingual Question Answering research and development.

Project mention: State-of-the-Art Multilingual Question Answering | /r/aiengineer | 2023-07-10

vectordb

1 462 7.9 Python

A Python vector database you just need - no more, no less. (by jina-ai)

Project mention: A Python Vector Database | news.ycombinator.com | 2023-08-13

elastiknn

1 352 8.8 Scala

Elasticsearch plugin for nearest neighbor search. Store vectors and run similarity search using exact and approximate algorithms.
cherche

12 311 4.4 Python

Neural Search

Project mention: [P] Semantic search | /r/MachineLearning | 2023-05-08

If you are interested, you can check out the documentation here: https://github.com/raphaelsty/cherche

neural-cherche

2 295 8.1 Python

Neural Search

Project mention: [P] Introducing Neural-Cherche: Enhance Document Retrieval with Advanced AI Models | /r/MachineLearning | 2023-11-19

I'm excited to share a tool I've developed called Neural-Cherche. Its main purpose is to transform a Sentence Transformer into a ColBERT model, which is currently at the forefront of information retrieval tools.

react-search

2 24 8.4 TypeScript

UI widget for adding semantic search to your React UI in just a few lines of code

Project mention: FLaNK 04 March 2024 | dev.to | 2024-03-04

weaviate-txtai

2 7 4.4 Python

An integration of the weaviate vector search engine with txtai

Project mention: External database integration | dev.to | 2023-09-07

As mentioned previously, all of the main components of txtai can be replaced with custom components. For example, there are external integrations for storing dense vectors in Weaviate and Qdrant to name a few.

AquilaHub

2 2 2.9 Python

Load and serve Neural Encoder Models
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

neural-search related posts

Jina.ai: Self-host Multimodal models
1 project | news.ycombinator.com | 26 Jan 2024
[P] Introducing Neural-Cherche: Enhance Document Retrieval with Advanced AI Models
1 project | /r/MachineLearning | 19 Nov 2023
FLaNK Stack Weekly for 30 Oct 2023
24 projects | dev.to | 30 Oct 2023
External database integration
2 projects | dev.to | 7 Sep 2023
Langchain Is Pointless
16 projects | news.ycombinator.com | 8 Jul 2023
[P] Semantic search
1 project | /r/MachineLearning | 8 May 2023
Minimalist semantic search with Cherche 2.0
1 project | news.ycombinator.com | 8 May 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →

Index

What are some of the best open-source neural-search projects? This list will help you:

	Project	Stars
1	jina	20,009
2	qdrant	17,839
3	clip-as-service	12,181
4	PaddleNLP	11,386
5	Weaviate	9,436
6	txtai	6,953
7	dalle-flow	2,824
8	docarray	2,739
9	finetuner	1,423
10	mteb	1,372
11	refinery	1,360
12	primeqa	698
13	vectordb	462
14	elastiknn	352
15	cherche	311
16	neural-cherche	295
17	react-search	24
18	weaviate-txtai	7
19	AquilaHub	2