semantifly
chonkie
semantifly | chonkie | |
---|---|---|
1 | 3 | |
14 | 2,845 | |
- | 18.6% | |
7.2 | 9.8 | |
7 months ago | 3 days ago | |
Go | Python | |
MIT License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
semantifly
-
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
My company (actually our two amazing interns) was working on this over the summer, we abandoned it but it’s 85% of the way to doing what you want it to do: https://github.com/accretional/semantifly
We stopped working on it mostly because we had higher priorities and because I became pretty disillusioned with top-K rag. We had to build out a better workflow system anyway, and with that we could instead just have models write and run specific queries (eg list all .ts files containing the word “DatabaseClient”), and otherwise have their context set by users explicitly.
The problem with RAG is that simplistic implementations distract and slow down models. You probably need an implementation that makes multiple passes to prune the context down to what you need to get good results, but that’s complicated enough that you might want to build something else that gives you more bang for your buck.
chonkie
-
Chunking your data for RAG
The Textractor extracts chunks of text from files and the Embeddings takes those chunks and builds an index/database. We'll use a late chunker backed by Chonkie.
-
DeepRAG: Thinking to Retrieval Step by Step for Large Language Models
Semantic chunking is where I would start with now. Also check this out: https://github.com/chonkie-ai/chonkie
- Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG
What are some alternatives?
tldw - tl/dw (Too Long, Didn't Watch): Your Personal Research Multi-Tool - a naive attempt at 'A Young Lady's Illustrated Primer' (Open Source NotebookLM)
rag-demystified - An LLM-powered advanced RAG pipeline built from scratch
webwright - Webwright is an AI-powered terminal emulator that lives within your OS. It eliminates time spent on repetitive tasks, conjures code, summons software, and bends the OS to its will. Are you ready to release the ghost in your shell?
dsRAG - High-performance retrieval engine for unstructured data
TalkWithYourFiles - An LLM GUI application; enables you to interact with your files, offering dynamic parameters that can modify response behavior during runtime.
ragdata - đź“š Build knowledge bases for RAG