Oracle of Zotero: LLM QA of Your Research Library

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

The-Oracle-of-Zotero

2 358 7.5 Python

LLM Chain querying a scientific Zotero library, with citations
paperetl

12 315 6.3 Python

📄 ⚙️ ETL processes for medical and scientific papers

Nice project!
I've spent quite a lot of time in the medical/scientific literature space. With regards to LLMs, specifically RAG, how the data is chunked is quite important. With that, I have a couple projects that might be beneficial additions.
paperetl (https://github.com/neuml/paperetl) - supports parsing arXiv, PubMed and integrates with GROBID to handle parsing metadata and text from arbitrary papers.
paperai (https://github.com/neuml/paperai) - builds embeddings databases of medical/scientific papers. Supports LLM prompting, semantic workflows and vector search. Built with txtai (https://github.com/neuml/txtai).
While arbitrary chunking/splitting can work, I've found that integrating parsing that has knowledge of medical/scientific paper structure increases the overall accuracy and experience of downstream applications.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
paperai

19 1,196 5.9 Python

📄 🤖 Semantic search and workflows for medical/scientific papers

Nice project!
I've spent quite a lot of time in the medical/scientific literature space. With regards to LLMs, specifically RAG, how the data is chunked is quite important. With that, I have a couple projects that might be beneficial additions.
paperetl (https://github.com/neuml/paperetl) - supports parsing arXiv, PubMed and integrates with GROBID to handle parsing metadata and text from arbitrary papers.
paperai (https://github.com/neuml/paperai) - builds embeddings databases of medical/scientific papers. Supports LLM prompting, semantic workflows and vector search. Built with txtai (https://github.com/neuml/txtai).
While arbitrary chunking/splitting can work, I've found that integrating parsing that has knowledge of medical/scientific paper structure increases the overall accuracy and experience of downstream applications.

txtai

355 6,953 9.3 Python

💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

Nice project!
I've spent quite a lot of time in the medical/scientific literature space. With regards to LLMs, specifically RAG, how the data is chunked is quite important. With that, I have a couple projects that might be beneficial additions.
paperetl (https://github.com/neuml/paperetl) - supports parsing arXiv, PubMed and integrates with GROBID to handle parsing metadata and text from arbitrary papers.
paperai (https://github.com/neuml/paperai) - builds embeddings databases of medical/scientific papers. Supports LLM prompting, semantic workflows and vector search. Built with txtai (https://github.com/neuml/txtai).
While arbitrary chunking/splitting can work, I've found that integrating parsing that has knowledge of medical/scientific paper structure increases the overall accuracy and experience of downstream applications.

paper-qa

10 3,608 8.7 Python

LLM Chain for answering questions from documents with citations

Why does this post link to a renamed fork of Paper-QA (https://github.com/whitead/paper-qa) which has made zero changes and is 19 commits behind the original?

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

What contributing to Open-source is, and what it isn't
1 project | news.ycombinator.com | 27 Apr 2024
Build knowledge graphs with LLM-driven entity extraction
1 project | dev.to | 21 Feb 2024
Bootstrap or VC?
1 project | news.ycombinator.com | 5 Feb 2024
txtai: An embeddings database for semantic search, graph networks and RAG
1 project | news.ycombinator.com | 3 Feb 2024
Txtai: An all-in-one embeddings database for semantic search and LLM workflows
1 project | news.ycombinator.com | 24 Jan 2024

Oracle of Zotero: LLM QA of Your Research Library

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Python Search Machine Learning scientific-papers NLP
Post date: 26 Nov 2023

The-Oracle-of-Zotero

paperetl

WorkOS

paperai

txtai

paper-qa

InfluxDB

Related posts

Oracle of Zotero: LLM QA of Your Research Library

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Python Search Machine Learning scientific-papers NLP Post date: 26 Nov 2023

The-Oracle-of-Zotero

paperetl

WorkOS

paperai

txtai

paper-qa

InfluxDB

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Python Search Machine Learning scientific-papers NLP
Post date: 26 Nov 2023