Python chunking

Open-source Python projects categorized as chunking

Top 6 Python chunking Projects

  1. NCRFpp

    NCRF++, a Neural Sequence Labeling Toolkit. Easy use to any sequence labeling tasks (e.g. NER, POS, Segmentation). It includes character LSTM/CNN, word LSTM/CNN and softmax/CRF components.

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. semchunk

    A fast, lightweight and easy-to-use Python library for splitting text into semantically meaningful chunks.

  4. TalkWithYourFiles

    An LLM GUI application; enables you to interact with your files, offering dynamic parameters that can modify response behavior during runtime.

  5. betterhtmlchunking

    BetterHTMLChunking is a Python library for intelligent HTML segmentation. It builds a DOM tree from raw HTML and extracts content-rich regions of interest, making content analysis effortless. Great for LLM based processing.

    Project mention: BetterHTMLChunking: A Python library for intelligent HTML segmentation | news.ycombinator.com | 2025-02-14
  6. NotEnoughAV1Encodes-Qt

    Linux GUI for AV1 Encoders

  7. bpe-tokenizer

    Byte-Pair Encoding tokenizer for training large language models on huge datasets

    Project mention: Show HN: Byte-Pair Encoding tokenizer for training LLMs on large datasets | news.ycombinator.com | 2024-10-11
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python chunking discussion

Log in or Post with

Python chunking related posts

  • DeepRAG: Thinking to Retrieval Step by Step for Large Language Models

    4 projects | news.ycombinator.com | 4 Feb 2025
  • Show HN: Chonkie – A Fast, Lightweight Text Chunking Library for RAG

    2 projects | news.ycombinator.com | 10 Nov 2024
  • semchunk alternatives - text-splitter and langchain

    3 projects | 9 Nov 2023

Index

What are some of the best open-source chunking projects in Python? This list will help you:

# Project Stars
1 NCRFpp 1,895
2 semchunk 287
3 TalkWithYourFiles 91
4 betterhtmlchunking 33
5 NotEnoughAV1Encodes-Qt 30
6 bpe-tokenizer 6

Sponsored
Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?