Python text-mining

Open-source Python projects categorized as text-mining

Top 10 Python text-mining Projects

text-mining
  1. trafilatura

    Python & Command-line tool to gather text and metadata on the Web: Crawling, scraping, extraction, output as CSV, JSON, HTML, MD, TXT, XML

    Project mention: Trafilatura: A tool and library to gather text and metadata on the Web | news.ycombinator.com | 2025-05-28
  2. InfluxDB

    InfluxDB – Built for High-Performance Time Series Workloads. InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.

    InfluxDB logo
  3. texthero

    Text preprocessing, representation and visualization from zero to hero.

  4. scattertext

    Beautiful visualizations of how language differs among document types.

  5. rake-nltk

    Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

  6. huspacy

    HuSpaCy: industrial-strength Hungarian natural language processing

  7. trrex

    Efficient string matching with regular expressions

  8. orange3-text

    🍊 :page_facing_up: Text Mining add-on for Orange3

  9. Sevalla

    Deploy and host your apps and databases, now with $50 credit! Sevalla is the PaaS you have been looking for! Advanced deployment pipelines, usage-based pricing, preview apps, templates, human support by developers, and much more!

    Sevalla logo
  10. llmine_core

    Your Platform for Text Mining through Configurable LLM Chains. Ideal for Developers and Semi-Technical Users

  11. Answerable

    Recommendation system for Stack Overflow unanswered questions

  12. corona-ml

    Machine learning to text-mine coronavirus research for CoronaCentral.ai

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python text-mining discussion

Log in or Post with

Python text-mining related posts

  • [Q] Does anyone use R to code qualitative data?

    3 projects | /r/rstats | 16 Oct 2022
  • Language Input: a new web app for finding content to watch in your target language and keep track of your vocabulary

    4 projects | /r/languagelearning | 24 Dec 2021
  • France: starting January 15, the health pass will be invalid "seven months after the last injection" in the absence of a booster dose

    1 project | /r/LockdownSkepticism | 27 Nov 2021
  • rake-nltk 1.0.6 released. Comes with the flexibility to choose your own sentence and word tokenizers.

    1 project | /r/Python | 15 Sep 2021
  • rake-nltk 1.0.6 released. Comes with the flexibility to choose your own sentence and word tokenizers.

    1 project | /r/textdatamining | 15 Sep 2021
  • rake-nltk 1.0.6 released. Comes with the flexibility to choose your own sentence and word tokenizers.

    1 project | /r/LanguageTechnology | 15 Sep 2021
  • [N] UK PhD Opportunity: Text mining the impact of SARS-CoV-2 mutations from the research literature at University of Glasgow

    1 project | /r/MachineLearning | 29 Jul 2021
  • A note from our sponsor - SaaSHub
    www.saashub.com | 1 Sep 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source text-mining projects in Python? This list will help you:

# Project Stars
1 trafilatura 4,617
2 texthero 2,905
3 scattertext 2,311
4 rake-nltk 1,067
5 huspacy 171
6 trrex 145
7 orange3-text 133
8 llmine_core 37
9 Answerable 16
10 corona-ml 10

Sponsored
InfluxDB – Built for High-Performance Time Series Workloads
InfluxDB 3 OSS is now GA. Transform, enrich, and act on time series data directly in the database. Automate critical tasks and eliminate the need to move data externally. Download now.
www.influxdata.com

Did you know that Python is
the 2nd most popular programming language
based on number of references?