vault-ai
paper-qa

vault-ai | paper-qa | |
---|---|---|
80 | 21 | |
3,363 | 7,551 | |
0.4% | 2.0% | |
5.2 | 9.6 | |
10 days ago | 3 days ago | |
JavaScript | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
vault-ai
- I built an open source website that lets you upload large files, such as in-depth novels/ebooks or academic papers, and ask GPT4 questions based on your specific knowledge base. So far, I've tested it with long books like the Odyssey and random research PDFs, and I'm shocked at how incisive it is.
-
Any better alternatives to fine-tuning GPT-3 yet to create a custom chatbot persona based on provided knowledge for others to use?
There's this GitHub repo for Pinecone Vector with custom knowledge base: VaultAI. But I'm sure the costs would be exorbitant at scale. Basically trains it on specific files, but the API is expensive as expected. Edit: I didn't read and thought you were talking about training your own, sorry. But I'll leave the second paragraph up anyways lol. Someone mentioned LLaMA and another Falcon, the latter of which I hadn't heard of but which looks good too.
-
I built an open source website that lets you upload large files such as academic PDFs or books and ask ChatGPT questions based on your custom knowledge base. So far, I've tried it with long ebooks like Plato's Republic, old letters, and random academic PDFs, and it works shockingly well.
Check out the instructions readme here! You may need a little bit of command line know-how but chatgpt can help guide you if you provide it the contents of the readme
- Are there any good free GPT-powered AI summarizer for very long text?
-
I built an open source website that lets you upload large files, such as long ebooks or academic papers, and ask ChatGPT questions about your specific knowledge base. So far, I've tested it with long e-books like the Odyssey and random research PDFs, and I'm shocked at how incisive it is
Yes, this use-case is a perfect fit actually – This deals very well with any type of manual with lots of human readable text (as opposed to charts or code). It is also better at answering more specific questions, so the example you gave regarding diagnosing engine issues is a really good match for what this is capable of. If you want to try it out you can check out the deployed version of the code here: https://vault.pash.city
-
Any help condensing academic journal articles using ChatGPT?
Have you tried Vault AI? Saw it pop up on a couple of other Reddits!
- OP Vault ChatGPT: Give ChatGPT long-term memory using the OP Stack (OpenAI + Pinecone Vector Database). Upload your own custom knowledge base files (PDF, txt, epub, etc) using a simple React frontend.
-
April 2023
OP Vault ChatGPT: Give ChatGPT long-term memory using the OP Stack (https://github.com/pashpashpash/vault-ai)
- Using ChatGPT to read multiple PDFs and create writing using them as sources
paper-qa
-
Show HN: Trieve CLI – Terminal-Based LLM Agent Loop with Search Tool for PDFs
https://github.com/Future-House/paper-qa?tab=readme-ov-file#... :
> PaperQA2 is engineered to be the best agentic RAG model for working with scientific papers.
> [ Semantic Scholar, CrossRef, ]
paperqa-zotero: https://github.com/lejacobroy/paperqa-zotero
The Oracle of Zotero is a fork of paperqa-zotero fork FAISS and langchain:
-
Hard problems that reduce to document ranking
Great article, I’ve had similar findings! LLM based “document-chunk” ranking is a core feature of PaperQA2 (https://github.com/Future-House/paper-qa) and part of why it works so well for scientific Q&A compared to traditional embedding-ranking based RAG systems.
-
Show HN: Kreuzberg – Modern async Python library for document text extraction
Love the name!
OCR was discussed here lately several times (https://news.ycombinator.com/item?id=42952605 and https://news.ycombinator.com/item?id=42871143), and some cool projects like https://github.com/Future-House/paper-qa?tab=readme-ov-file#... are using PyMuPDF. My experience with Tesseract is pretty sad, it's usually not good enough and modern LLMs are better.
-
Show HN: I made a website to semantically search ArXiv papers
This is awesome! If you’re interested, you could add a search tool client for your backend in paper-qa (https://github.com/Future-House/paper-qa). Then paper-qa users would be able to use your semantic search as part of its workflow.
- Language agents achieve superhuman synthesis of scientific knowledge
-
WikiCrow: An article for every gene in the human genome
We used an open-source AI RAG library, PaperQA2 (https://github.com/Future-House/paper-qa), to generate well cited articles for every gene in the human genome, ~15k of which had no existing prior articles. In terms of factuality, we tested our generated claims against the same gene's human written Wikipedia article in a blinded study evaluated by PhD biologists. Our system's articles were more precise on average than cited claims from existing articles. (https://paper.wikicrow.ai)
The system is scalable in that we can comfortably generate all 19.2k gene articles once per week, building a repository of cited articles that automatically syncs with all published literature.
- PaperQA2: RAG model for working with scientific papers
-
Exchanging more frontier LLM compute for higher accuracy in RAG systems
We're sharing some experiments in designing RAG systems via the open source PaperQA2 system (https://github.com/Future-House/paper-qa). PaperQA2's design is interesting because it isn't concerned with cost, so it uses expensive operations like agentic tool calling and LLM based re-ranking and contextual summarization for each query.
Even though the costs are higher, we see that the RAG accuracy gains (in question-answering tasks) are worth it. Including LLM chunk re-ranking and contextual summaries in your RAG flow also makes the system robust to changes in chunk sizes, parsing oddities and embedding model shortcomings. It's one of the largest drivers of performance we could find.
- PaperQA2: Superhuman Scientific Literature Search
- Show HN: PaperQA2, Agentic RAG for Science
What are some alternatives?
ai-pdf-chatbot-langchain - AI PDF chatbot agent built with LangChain & LangGraph
OpenGPT - A framework for creating grounded instruction based datasets and training conversational domain expert Large Language Models (LLMs).
chatgpt-memory - Allows to scale the ChatGPT API to multiple simultaneous sessions with infinite contextual and adaptive memory powered by GPT and Redis datastore.
simple-llm-finetuner - Simple UI for LLM Model Finetuning
unstructured - Convert documents to structured data effortlessly. Unstructured is open-source ETL solution for transforming complex documents into clean, structured formats for language models. Visit our website to learn more about our enterprise grade Platform product for production grade workflows, partitioning, enrichments, chunking and embedding.
The-Oracle-of-Zotero - LLM Chain querying a scientific Zotero library, with citations
