Python Summarizer

Open-source Python projects categorized as Summarizer

Top 5 Python Summarizer Projects

  1. sumy

    Module for automatic summarization of text documents and HTML pages.

  2. Judoscale

    Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.

    Judoscale logo
  3. tldw

    tl/dw (Too Long, Didn't Watch): Your Personal Research Multi-Tool - a naive attempt at 'A Young Lady's Illustrated Primer' (Open Source NotebookLM)

    Project mention: Show HN: Morphik – Open-source RAG that understands PDF images, runs locally | news.ycombinator.com | 2025-04-22

    Hey yes, I’m building exactly that.

    https://github.com/rmusser01/tldw

    I first built a POC in gradio and am now rebuilding it as a FastAPI app. The media processing endpoints work but I’m still tweaking media ingestion to allow for syncing to clients(idea is to allow for client-first design).

  4. wdoc

    Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, scalable (?), WIP

    Project mention: Show HN: Sort lines semantically using LLM-sort | news.ycombinator.com | 2025-02-10

    I recently built a semantic batching function for my RAG system [wdoc](https://github.com/thiswillbeyourgithub/wdoc/) that might be interesting to others. The system splits a corpus into chunks, finds relevant ones via embeddings, and answers questions for each chunk in parallel before aggregating the answers.

    To optimize performance and reduce LLM distraction, instead of aggregating answers two by two, it does batched aggregation. The key innovation is in the batching order - I implemented a [semantic_batching function](https://github.com/thiswillbeyourgithub/wdoc/blob/18bc52128f...) that uses hierarchical clustering on the embeddings and orders texts by leaf order.

    The implementation was straightforward, runs very fast and produces great results. The function is designed to be usable as a standalone tool for others to experiment with.

  5. CX_DB8

    a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)

  6. reddit-thread-summarizer

    A Reddit thread summarizer is a tool that generates a summary of the main points or themes discussed in a Reddit thread

  7. InfluxDB

    InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.

    InfluxDB logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Summarizer discussion

Log in or Post with

Python Summarizer related posts

Index

What are some of the best open-source Summarizer projects in Python? This list will help you:

# Project Stars
1 sumy 3,580
2 tldw 736
3 wdoc 448
4 CX_DB8 229
5 reddit-thread-summarizer 17

Sponsored
Save 47% on cloud hosting with autoscaling that just works
Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
judoscale.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?