Our great sponsors
-
semchunk
A fast and lightweight pure Python library for splitting text into semantically meaningful chunks.
-
text-splitter
Split text into semantic chunks, up to a desired chunk size. Supports calculating length by characters and tokens, and is callable from Rust and Python.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
semchunk is 77.35% faster than the semantic-text-splitter Python library. It is also implemented entirely in Python, whereas the semantic-text-splitter library is in Rust. Thus, it is compatible with pypy.
Owing to its complex yet highly efficient chunking algorithm, semchunk is more semantically accurate than Langchain's RecursiveCharacterTextSplitter.
Related posts
- Haystack DB – 10x faster than FAISS with binary embeddings by default
- Rust Keyword Extraction: Creating the YAKE! algorithm from scratch
- What contributing to Open-source is, and what it isn't
- Pen.el – Emacs-based operating system designed with holiness in mind
- Lossless Acceleration of LLM via Adaptive N-Gram Parallel Decoding