Top 23 question-answering Open-Source Projects

haystack

54 13,564 9.9 Python

:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

Project mention: Release Radar • March 2024 Edition | dev.to | 2024-04-07

View on GitHub

PaddleNLP

2 11,386 9.8 Python

👑 Easy-to-use and powerful NLP and LLM library with 🤗 Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including 🗂Text Classification, 🔍 Neural Search, ❓ Question Answering, ℹ️ Information Extraction, 📄 Document Intelligence, 💌 Sentiment Analysis etc.
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
simpletransformers

6 3,972 7.3 Python

Transformers for Information Retrieval, Text Classification, NER, QA, Language Modelling, Language Generation, T5, Multi-Modal, and Conversational AI
spark-nlp

87 3,667 9.4 Scala

State of the Art Natural Language Processing

Project mention: Spark NLP 5.1.0: Introducing state-of-the-art OpenAI Whisper speech-to-text, OpenAI Embeddings and Completion transformers, MPNet text embeddings, ONNX support for E5 text embeddings, new multi-lingual BART Zero-Shot text classification, and much more! | /r/Python | 2023-09-06

paper-qa

10 3,593 8.7 Python

LLM Chain for answering questions from documents with citations

Project mention: Oracle of Zotero: LLM QA of Your Research Library | news.ycombinator.com | 2023-11-26

Why does this post link to a renamed fork of Paper-QA (https://github.com/whitead/paper-qa) which has made zero changes and is 19 commits behind the original?

vault-ai

80 3,217 5.7 JavaScript

OP Vault ChatGPT: Give ChatGPT long-term memory using the OP Stack (OpenAI + Pinecone Vector Database). Upload your own custom knowledge base files (PDF, txt, epub, etc) using a simple React frontend.

Project mention: I built an open source website that lets you upload large files, such as in-depth novels/ebooks or academic papers, and ask GPT4 questions based on your specific knowledge base. So far, I've tested it with long books like the Odyssey and random research PDFs, and I'm shocked at how incisive it is. | /r/ChatGPT | 2023-08-05

llmware

9 3,056 9.8 Python

Providing enterprise-grade LLM-based development framework, tools, and fine-tuned models.

Project mention: More Agents Is All You Need: LLMs performance scales with the number of agents | news.ycombinator.com | 2024-04-06

I couldn't agree more. You should check out LLMWare's SLIM agents (https://github.com/llmware-ai/llmware/tree/main/examples/SLI...). It's focusing on pretty much exactly this and chaining multiple local LLMs together.
A really good topic that ties in with this is the need for deterministic sampling (I may have the terminology a bit incorrect) depending on what the model is indended for. The LLMWare team did a good 2 part video on this here as well (https://www.youtube.com/watch?v=7oMTGhSKuNY)
I think dedicated miniture LLMs are the way forward.
Disclaimer - Not affiliated with them in any way, just think it's a really cool project.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
rust-bert

7 2,415 6.8 Rust

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

Project mention: How to leverage the state-of-the-art NLP models in Rust | /r/infinilabs | 2023-06-07

brew install libtorch brew link libtorch brew ls --verbose libtorch | grep dylib export LIBTORCH=$(brew --cellar pytorch)/$(brew info --json pytorch | jq -r '.[0].installed[0].version') export LD_LIBRARY_PATH=${LIBTORCH}/lib:$LD_LIBRARY_PATH git clone https://github.com/guillaume-be/rust-bert.git cd rust-bert ORT_STRATEGY=system cargo run --example sentence_embeddings

FARM

3 1,723 0.0 Python

:house_with_garden: Fast & easy transfer learning for NLP. Harvesting language models for the industry. Focus on Question Answering.
bootcamp

24 1,606 9.1 HTML

Dealing with all unstructured data, such as reverse image search, audio search, molecular search, video analysis, question and answer systems, NLP, etc. (by milvus-io)

Project mention: FLaNK AI - 01 April 2024 | dev.to | 2024-04-01

Awesome-LLM-Reasoning

1 1,062 7.3

Reasoning in Large Language Models: Papers and Resources, including Chain-of-Thought, Instruction-Tuning and Multimodality.

Project mention: Techbro says that GPT models will soon have over 9000 IQ in ~5 years | /r/SneerClub | 2023-05-04

Questgen.ai

3 871 6.3 Python

Question generation using state-of-the-art Natural Language Processing algorithms

Project mention: Yes/No style Question and Answer Generation | /r/learnpython | 2023-06-15

I have tried to do some searching for models but there don't seem to be ones that do what I am looking for. The closest I found was Questgen, but it only generated the questions and they, more often than, not did not make sense - especially for the types of questions I was looking to generate.

ThoughtSource

1 832 8.4 Jupyter Notebook

A central, open resource for data and tools related to chain-of-thought reasoning in large language models. Developed @ Samwald research group: https://samwald.info/
primeqa

5 696 8.8 Python

The prime repository for state-of-the-art Multilingual Question Answering research and development.

Project mention: State-of-the-Art Multilingual Question Answering | /r/aiengineer | 2023-07-10

llmflows

1 612 8.6 Python

LLMFlows - Simple, Explicit and Transparent LLM Apps

Project mention: Show HN: LLMFlows – LangChain alternative for explicit and transparent apps | news.ycombinator.com | 2023-07-29

qagnn

6 588 0.0 Python

[NAACL 2021] QAGNN: Question Answering using Language Models and Knowledge Graphs 🤖
fastT5

5 539 0.0 Python

⚡ boost inference speed of T5 models by 5x & reduce the model size by 3x.
Giveme5W1H

1 500 0.0 HTML

Extraction of the journalistic five W and one H questions (5W1H) from news articles: who did what, when, where, why, and how?
happy-transformer

1 497 9.0 Python

Happy Transformer makes it easy to fine-tune and perform inference with NLP Transformer models.
lumos

4 404 8.9 Python

Code and data for "Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs" (by allenai)

Project mention: Lumos: Learning Agents with Unified Data, Modular Design, and Open-Source LLMs | news.ycombinator.com | 2024-04-01

Guess you are looking for this - https://github.com/allenai/lumos/blob/main/README.md

PIXIU

6 393 9.0 Python

This repository introduces PIXIU, an open-source resource featuring the first financial large language models (LLMs), instruction tuning data, and evaluation benchmarks to holistically assess financial LLMs. Our goal is to continually push forward the open-source development of financial artificial intelligence (AI).

Project mention: PIXIU: NEW Data - star count:172.0 | /r/algoprojects | 2023-08-15

LinkBERT

2 389 1.8 Python

[ACL 2022] LinkBERT: A Knowledgeable Language Model 😎 Pretrained with Document Links
megabots

16 334 6.9 Python

🤖 State-of-the-art, production ready LLM apps made mega-easy, so you don't have to build them from scratch 🤯 Create a bot, now 🫵

Project mention: 🤖 Release 0.0.11 in Megabots | Memory and Vectorstores are live! | /r/LLMDevs | 2023-04-26

SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2024-04-07.

question-answering related posts

Generative AI Frameworks and Tools Every Developer Should Know!
1 project | dev.to | 13 Dec 2023
Best way to programmatically extract data from a set of .pdf files?
1 project | /r/artificial | 9 Dec 2023
Llama2 and Haystack on Colab
2 projects | news.ycombinator.com | 21 Jul 2023
Build with LLMs for production with Haystack – has 10k stars on GitHub
2 projects | news.ycombinator.com | 17 Jul 2023
Show HN: Haystack – Production-Ready LLM Framework
1 project | news.ycombinator.com | 11 Jul 2023
Creating search engine for your local network - Is it even possible?
2 projects | /r/selfhosted | 2 Jul 2023
Show HN: "banks" Using Jinja as the basis of LLM prompt templating
2 projects | news.ycombinator.com | 15 Jun 2023
A note from our sponsor - SaaSHub
www.saashub.com | 19 Apr 2024

SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source question-answering projects? This list will help you:

	Project	Stars
1	haystack	13,564
2	PaddleNLP	11,386
3	simpletransformers	3,972
4	spark-nlp	3,667
5	paper-qa	3,593
6	vault-ai	3,217
7	llmware	3,056
8	rust-bert	2,415
9	FARM	1,723
10	bootcamp	1,606
11	Awesome-LLM-Reasoning	1,062
12	Questgen.ai	871
13	ThoughtSource	832
14	primeqa	696
15	llmflows	612
16	qagnn	588
17	fastT5	539
18	Giveme5W1H	500
19	happy-transformer	497
20	lumos	404
21	PIXIU	393
22	LinkBERT	389
23	megabots	334