Top 6 Python chunking Projects
-
NCRFpp
NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
semchunk
A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.
-
TalkWithYourFiles
An LLM GUI application; enables you to interact with your files, offering dynamic parameters that can modify response behavior during runtime.
-
betterhtmlchunking
BetterHTMLChunking is a Python library for intelligent HTML segmentation. It builds a DOM tree from raw HTML and extracts content-rich regions of interest, making content analysis effortless. Great for LLM based processing.
Project mention: BetterHTMLChunking: A Python library for intelligent HTML segmentation | news.ycombinator.com | 2025-02-14 -
-
Project mention: Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets | news.ycombinator.com | 2024-10-11
Python chunking discussion
Python chunking related posts
Index
What are some of the best open-source chunking projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | NCRFpp | 1,895 |
2 | semchunk | 287 |
3 | TalkWithYourFiles | 91 |
4 | betterhtmlchunking | 33 |
5 | NotEnoughAV1Encodes-Qt | 30 |
6 | bpe-tokenizer | 6 |