Chunking your data for RAG

This page summarizes the projects mentioned and recommended in the original post on dev.to

Judoscale - Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com
featured
InfluxDB high-performance time series database
Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
influxdata.com
featured
  1. txtai

    💡 All-in-one open-source embeddings database for semantic search, LLM orchestration and language model workflows

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. chonkie

    Discontinued 🦛 CHONK your texts with Chonkie ✨ - The no-nonsense RAG chunking library [GET https://api.github.com/repos/chonkie-ai/chonkie: 404 - Not Found // See: https://docs.github.com/rest]

    The Textractor extracts chunks of text from files and the Embeddings takes those chunks and builds an index/database. We'll use a late chunker backed by Chonkie.

  4. txtai.js

    JavaScript client for txtai

  5. txtai.java

    Java client for txtai

  6. txtai.rs

    Rust client for txtai

  7. txtai.go

    Go client for txtai

  8. transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  9. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
  10. fastapi

    Discontinued FastAPI framework, high performance, easy to learn, fast to code, ready for production [Moved to: https://github.com/fastapi/fastapi] (by tiangolo)

  11. rag

    🚀 Retrieval Augmented Generation (RAG) with txtai. Combine search and LLMs to find insights with your own data.

  12. ragdata

    📚 Build knowledge bases for RAG

  13. paperai

    📄 🤖 Semantic search and workflows for medical/scientific papers

  14. annotateai

    📝 Automatically annotate papers using LLMs

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • Introducing the Overflow Offline project

    4 projects | /r/programming | 21 Oct 2022
  • [P] Stack Overflow Semantic Search

    1 project | /r/MachineLearning | 7 Oct 2022
  • Semantic search of Stack Overflow with codequestion

    1 project | /r/opensource | 6 Oct 2022
  • Semantic search of Stack Overflow with codequestion

    1 project | /r/opensource | 6 Oct 2022
  • Semantic search of Stack Overflow with codequestion

    1 project | /r/coolgithubprojects | 6 Oct 2022

Did you know that Python is
the 2nd most popular programming language
based on number of references?