Top 5 Python Summarizer Projects
-
-
Judoscale
Save 47% on cloud hosting with autoscaling that just works. Judoscale integrates with Django, FastAPI, Celery, and RQ to make autoscaling easy and reliable. Save big, and say goodbye to request timeouts and backed-up task queues.
-
tldw
tl/dw (Too Long, Didn't Watch): Your Personal Research Multi-Tool - a naive attempt at 'A Young Lady's Illustrated Primer' (Open Source NotebookLM)
Project mention: Show HN: Morphik – Open-source RAG that understands PDF images, runs locally | news.ycombinator.com | 2025-04-22Hey yes, I’m building exactly that.
https://github.com/rmusser01/tldw
I first built a POC in gradio and am now rebuilding it as a FastAPI app. The media processing endpoints work but I’m still tweaking media ingestion to allow for syncing to clients(idea is to allow for client-first design).
-
wdoc
Summarize and query from a lot of heterogeneous documents. Any LLM provider, any filetype, scalable (?), WIP
Project mention: Show HN: Sort lines semantically using LLM-sort | news.ycombinator.com | 2025-02-10I recently built a semantic batching function for my RAG system [wdoc](https://github.com/thiswillbeyourgithub/wdoc/) that might be interesting to others. The system splits a corpus into chunks, finds relevant ones via embeddings, and answers questions for each chunk in parallel before aggregating the answers.
To optimize performance and reduce LLM distraction, instead of aggregating answers two by two, it does batched aggregation. The key innovation is in the batching order - I implemented a [semantic_batching function](https://github.com/thiswillbeyourgithub/wdoc/blob/18bc52128f...) that uses hierarchical clustering on the embeddings and orders texts by leaf order.
The implementation was straightforward, runs very fast and produces great results. The function is designed to be usable as a standalone tool for others to experiment with.
-
CX_DB8
a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)
-
reddit-thread-summarizer
A Reddit thread summarizer is a tool that generates a summary of the main points or themes discussed in a Reddit thread
-
InfluxDB
InfluxDB high-performance time series database. Collect, organize, and act on massive volumes of high-resolution data to power real-time intelligent systems.
Python Summarizer discussion
Python Summarizer related posts
Index
What are some of the best open-source Summarizer projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | sumy | 3,580 |
2 | tldw | 736 |
3 | wdoc | 448 |
4 | CX_DB8 | 229 |
5 | reddit-thread-summarizer | 17 |