Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python Summarization Projects
-
haystack
:mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
simpleT5
simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
CX_DB8
a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)
-
summarizepaper
An AI-powered arXiv paper summarization website with a virtual assistant for answering questions.
-
BooookScore
A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper, "BooookScore: A systematic exploration of book-length summarization in the era of LLMs".
-
Auto-Research
Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!
-
SelSum
Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.
-
Text-Summarization-using-NLP
Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization
-
LTC-SUM
Implementation of LTC-SUM: Lightweight Client-driven Personalized Video Summarization Framework Using 2D CNN
-
youtube-ai-assistant
AI Assistant to get summarized text from Youtube video and also get instant answers to your queries related to video
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
View on GitHub
I think of guardrails as another dimension of human preferences: whether you are training a model to answer questions more gooder or avoid saying horrifying stuff, you are teaching the model a preference. So I thinks it's a straightforward RLHF problem but from a different perspective.
Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07
I was working on this stuff before it was cool, so in the sense of the precursor to LLMs (and sometimes supporting LLMs still) I've built many things:
1. Games you can play with word2vec or related models (could be drop in replaced with sentence transformer). It's crazy that this is 5 years old now: https://github.com/Hellisotherpeople/Language-games
2. "Constrained Text Generation Studio" - A research project I wrote when I was trying to solve LLM's inability to follow syntactic, phonetic, or semantic constraints: https://github.com/Hellisotherpeople/Constrained-Text-Genera...
3. DebateKG - A bunch of "Semantic Knowledge Graphs" built on my pet debate evidence dataset (LLM backed embeddings indexes synchronized with a graphDB and a sqlDB via txtai). Can create compelling policy debate cases https://github.com/Hellisotherpeople/DebateKG
4. My failed attempt at a good extractive summarizer. My life work is dedicated to one day solving the problems I tried to fix with this project: https://github.com/Hellisotherpeople/CX_DB8
See https://github.com/pszemraj/textsum. He's the guy that trained most of the popular long finetuned long models. He created a pip package to make life easier (which uses Huggingface under the hood, just pre-selects good models and obfuscates boilerplate).
Project mention: Evaluating faithfulness and content selection of LLMs in book-length summaries | news.ycombinator.com | 2024-04-09With a link to https://arxiv.org/pdf/2310.00785.pdf - which then links to another GitHub repository, https://github.com/lilakk/BooookScore which has a bunch of prompts in https://github.com/lilakk/BooookScore/tree/main/prompts
Which makes me think that this original paper isn't evaluating LLMs so much as it's evaluating that one particular prompting technique for long summaries.
Gemini Pro 1.5 has 1m token context length, which should remove the need for weird hierarchical summary tricks. I wonder how well it would score?
Project mention: Show HN: TL;DWOL – Summarize videos or audio on your machine using AI | news.ycombinator.com | 2023-09-24
Project mention: LTC-Sum:Client-Driven Personalized Video Summarization Framework Using 2D CNN | news.ycombinator.com | 2023-05-10
Project mention: YouTube AI Assistant: Chat with Any YouTube Video | news.ycombinator.com | 2023-10-23
Python Summarization related posts
- How critical theory is radicalizing high school debate
- Copy is all you need
- Transcribe YouTube Videos in Bulk
- Access machine learning models from different cloud providers
- The only Python SDK for accessing machine learning models from multiple providers.
- Converse with book – Built with GPT-3
- Targeted Summarization - A tool for information extraction
-
A note from our sponsor - InfluxDB
www.influxdata.com | 28 Apr 2024
Index
What are some of the best open-source Summarization projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | haystack | 13,633 |
2 | sumy | 3,417 |
3 | pytextrank | 2,098 |
4 | RL4LMs | 2,084 |
5 | LLM-Finetuning-Toolkit | 659 |
6 | dr-doc-search | 601 |
7 | simpleT5 | 381 |
8 | CX_DB8 | 222 |
9 | summarizepaper | 219 |
10 | textsum | 110 |
11 | factsumm | 105 |
12 | ctc-gen-eval | 93 |
13 | targetedSummarization | 87 |
14 | summarizers | 76 |
15 | BooookScore | 65 |
16 | Auto-Research | 48 |
17 | SelSum | 44 |
18 | Text-Summarization-using-NLP | 33 |
19 | bert2bert-summarization | 30 |
20 | tldwol | 23 |
21 | LTC-SUM | 17 |
22 | youtube-ai-assistant | 5 |
23 | slashml-python-client | 4 |
Sponsored