text chunking

Open-source projects categorized as text chunking
Language: + Python + C#

text chunking Open-Source Projects

text chunking
  • semchunk

    A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.

  • Project mention: semchunk alternatives - text-splitter and langchain | libhunt.com/r/semchunk | 2023-11-09
  • Scout Monitoring

    Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.

    Scout Monitoring logo
  • SemanticSlicer

    A recursive text chunker that attempts to preserve context.

  • Project mention: Pg_vectorize: The simplest way to do vector search and RAG on Postgres | news.ycombinator.com | 2024-03-06

    I wrote a C# library to do this, which is similar to other chunking approaches that are common, like the way langchain does it: https://github.com/drittich/SemanticSlicer

    Given a list of separators (regexes), it goes through them in order and keeps splitting the text by them until the chunk fits within the desired size. By putting the higher level separators first (e.g., for HTML split by

    before

    ), it's a pretty good proxy for maintaining context.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

text chunking discussion

Log in or Post with

text chunking related posts

  • semchunk alternatives - text-splitter and langchain

    3 projects | 9 Nov 2023

Index

Project Stars
1 semchunk 79
2 SemanticSlicer 8

Sponsored
Free Django app performance insights with Scout Monitoring
Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
www.scoutapm.com