Chonkie Alternatives
Similar projects and alternatives to chonkie
-
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
swirl-search
Swirl is an open-source search platform that uses AI to search multiple content and data sources simultaneously and return AI-ranked results. And provides summaries of your answers from searches using LLMs. It's a one-click, easy-to-use Retrieval Augmented Generation (RAG) Solution.
-
MindsDB
AI's query engine - Platform for building AI that can learn and answer questions over large scale federated data.
-
R2R
SoTA production-ready AI retrieval system. Agentic Retrieval-Augmented Generation (RAG) with a RESTful API.
-
vlm-api
REST API for computing cross-modal similarity between images and text using the ColPaLI vision-language model
-
-
quivr
Opiniated RAG for integrating GenAI in your apps 🧠Focus on your product rather than the RAG. Easy integration in existing products with customisation! Any LLM: GPT4, Groq, Llama. Any Vectorstore: PGVector, Faiss. Any Files. Anyway you want.
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
chonkie discussion
chonkie reviews and mentions
-
Show HN: Fast and Quality Code Chunking with Chonkie
Hi HN,
We’re Chonkie (https://github.com/chonkie-inc/chonkie) — we build open source tools that help split documents into meaningful chunks for use with AI models.
When you use LLMs over large documents or codebases, you often need to break them into smaller parts to fit the model’s context window. Our chunkers do this in a smart way: they preserve structure and meaning, so only the most relevant pieces are passed into the model. This reduces hallucinations, avoids confusion, and improves performance and accuracy.
Today we’re launching our Code Chunker — a fast, structure-aware way to break down source code into high-quality, token-aware chunks.
How it works:
(See the code: https://github.com/chonkie-inc/chonkie/blob/main/src/chonkie...)
Code Chunker uses tree-sitter (https://tree-sitter.github.io/tree-sitter/) to parse your code into an abstract syntax tree (AST). It then recursively merges and groups nodes in a way that respects both code structure and token limits.
It supports all languages that tree-sitter supports, and is designed to preserve formatting and semantics. Large functions or class definitions won’t be split in the middle of a block — instead, we dive recursively into the AST to produce clean, coherent chunks that fit your configured token budget.
What it’s useful for:
- Embedding-based code search
-
Show HN: Chonkie Cloud – No-nonsense chunking now on the the cloud
We launched Chonkie as an open-source project late last year. A few weeks ago, we decided to go full-time on it. Unfortunately, this shift wasn’t as smooth as we had hoped. Due to some legal stuff, we had to rebuild the entire project from scratch in a new repo.
Restarting sucked but it gave us the chance to clean things up and build something faster, cleaner, and better. You can check out the new repo here: https://github.com/chonkie-inc/chonkie
Stats
chonkie-inc/chonkie is an open source project licensed under MIT License which is an OSI approved license.
The primary programming language of chonkie is Python.