Python Summarization

Open-source Python projects categorized as Summarization

Top 23 Python Summarization Projects

  • haystack

    :mag: LLM orchestration framework to build customizable, production-ready LLM applications. Connect components (models, vector DBs, file converters) to pipelines or agents that can interact with your data. With advanced retrieval methods, it's best suited for building RAG, question answering, semantic search or conversational agent chatbots.

  • Project mention: Release Radar • March 2024 Edition | dev.to | 2024-04-07

    View on GitHub

  • sumy

    Module for automatic summarization of text documents and HTML pages.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • pytextrank

    Python implementation of TextRank algorithms ("textgraphs") for phrase extraction

  • RL4LMs

    A modular RL library to fine-tune language models to human preferences

  • Project mention: How To Setup a Model With Guardrails? | /r/LocalLLaMA | 2023-05-12

    I think of guardrails as another dimension of human preferences: whether you are training a model to answer questions more gooder or avoid saying horrifying stuff, you are teaching the model a preference. So I thinks it's a straightforward RLHF problem but from a different perspective.

  • LLM-Finetuning-Toolkit

    Toolkit for fine-tuning, ablating and unit-testing open-source LLMs.

  • Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07
  • simpleT5

    simpleT5 is built on top of PyTorch-lightning⚡️ and Transformers🤗 that lets you quickly train your T5 models.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • CX_DB8

    a contextual, biasable, word-or-sentence-or-paragraph extractive summarizer powered by the latest in text embeddings (Bert, Universal Sentence Encoder, Flair)

  • Project mention: Ask HN: What have you built with LLMs? | news.ycombinator.com | 2024-02-05

    I was working on this stuff before it was cool, so in the sense of the precursor to LLMs (and sometimes supporting LLMs still) I've built many things:

    1. Games you can play with word2vec or related models (could be drop in replaced with sentence transformer). It's crazy that this is 5 years old now: https://github.com/Hellisotherpeople/Language-games

    2. "Constrained Text Generation Studio" - A research project I wrote when I was trying to solve LLM's inability to follow syntactic, phonetic, or semantic constraints: https://github.com/Hellisotherpeople/Constrained-Text-Genera...

    3. DebateKG - A bunch of "Semantic Knowledge Graphs" built on my pet debate evidence dataset (LLM backed embeddings indexes synchronized with a graphDB and a sqlDB via txtai). Can create compelling policy debate cases https://github.com/Hellisotherpeople/DebateKG

    4. My failed attempt at a good extractive summarizer. My life work is dedicated to one day solving the problems I tried to fix with this project: https://github.com/Hellisotherpeople/CX_DB8

  • summarizepaper

    An AI-powered arXiv paper summarization website with a virtual assistant for answering questions.

  • textsum

    CLI & Python API to easily summarize text-based files with transformers

  • Project mention: Training on documents to summarize them. | /r/LocalLLaMA | 2023-06-16

    See https://github.com/pszemraj/textsum. He's the guy that trained most of the popular long finetuned long models. He created a pip package to make life easier (which uses Huggingface under the hood, just pre-selects good models and obfuscates boilerplate).

  • factsumm

    FactSumm: Factual Consistency Scorer for Abstractive Summarization

  • ctc-gen-eval

    EMNLP 2021 - CTC: A Unified Framework for Evaluating Natural Language Generation

  • targetedSummarization

    TextReducer - A Tool for Summarization and Information Extraction

  • summarizers

    Package for controllable summarization

  • BooookScore

    A package to generate summaries of long-form text and evaluate the coherence of these summaries. Official package for our ICLR 2024 paper, "BooookScore: A systematic exploration of book-length summarization in the era of LLMs".

  • Project mention: Evaluating faithfulness and content selection of LLMs in book-length summaries | news.ycombinator.com | 2024-04-09

    With a link to https://arxiv.org/pdf/2310.00785.pdf - which then links to another GitHub repository, https://github.com/lilakk/BooookScore which has a bunch of prompts in https://github.com/lilakk/BooookScore/tree/main/prompts

    Which makes me think that this original paper isn't evaluating LLMs so much as it's evaluating that one particular prompting technique for long summaries.

    Gemini Pro 1.5 has 1m token context length, which should remove the need for weird hierarchical summary tricks. I wonder how well it would score?

  • Auto-Research

    Generate custom detailed survey paper with topic clustered sections and proper citations, from just a single query in just under 30 mins !!

  • SelSum

    Abstractive opinion summarization system (SelSum) and the largest dataset of Amazon product summaries (AmaSum). EMNLP 2021 conference paper.

  • Text-Summarization-using-NLP

    Text Summarization using NLP to fetch BBC News Article and summarize its text and also it includes custom article Summarization

  • bert2bert-summarization

    Abstractive summarization using Bert2Bert framework.

  • tldwol

    Web API that summarizes multimedia from various sources using modern AI tools.

  • Project mention: Show HN: TL;DWOL – Summarize videos or audio on your machine using AI | news.ycombinator.com | 2023-09-24
  • LTC-SUM

    Implementation of LTC-SUM: Lightweight Client-driven Personalized Video Summarization Framework Using 2D CNN

  • Project mention: LTC-Sum:Client-Driven Personalized Video Summarization Framework Using 2D CNN | news.ycombinator.com | 2023-05-10
  • youtube-ai-assistant

    AI Assistant to get summarized text from Youtube video and also get instant answers to your queries related to video

  • Project mention: YouTube AI Assistant: Chat with Any YouTube Video | news.ycombinator.com | 2023-10-23
  • slashml-python-client

    SlashML Python Client

  • Project mention: Transcribe YouTube Videos in Bulk | news.ycombinator.com | 2023-07-03
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Summarization related posts

Index

What are some of the best open-source Summarization projects in Python? This list will help you:

Project Stars
1 haystack 13,633
2 sumy 3,417
3 pytextrank 2,098
4 RL4LMs 2,084
5 LLM-Finetuning-Toolkit 659
6 dr-doc-search 601
7 simpleT5 381
8 CX_DB8 222
9 summarizepaper 219
10 textsum 110
11 factsumm 105
12 ctc-gen-eval 93
13 targetedSummarization 87
14 summarizers 76
15 BooookScore 65
16 Auto-Research 48
17 SelSum 44
18 Text-Summarization-using-NLP 33
19 bert2bert-summarization 30
20 tldwol 23
21 LTC-SUM 17
22 youtube-ai-assistant 5
23 slashml-python-client 4

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com