text chunking Open-Source Projects
-
semchunk
A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
Project mention: semchunk alternatives - text-splitter and langchain | libhunt.com/r/semchunk | 2023-11-09
Project mention: Pg_vectorize: The simplest way to do vector search and RAG on Postgres | news.ycombinator.com | 2024-03-06I wrote a C# library to do this, which is similar to other chunking approaches that are common, like the way langchain does it: https://github.com/drittich/SemanticSlicer
Given a list of separators (regexes), it goes through them in order and keeps splitting the text by them until the chunk fits within the desired size. By putting the higher level separators first (e.g., for HTML split by
before
), it's a pretty good proxy for maintaining context.
NOTE:
The open source projects on this list are ordered by number of github stars.
The number of mentions indicates repo mentiontions in the last 12 Months or
since we started tracking (Dec 2020).
text chunking discussion
text chunking related posts
Index
Project | Stars | |
---|---|---|
1 | semchunk | 79 |
2 | SemanticSlicer | 8 |