Python text-mining

Open-source Python projects categorized as text-mining

Top 11 Python text-mining Projects

  • texthero

    Text preprocessing, representation and visualization from zero to hero.

  • trafilatura

    Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments

  • Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14

    The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features

    Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • scattertext

    Beautiful visualizations of how language differs among document types.

  • rake-nltk

    Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK.

  • huspacy

    HuSpaCy: industrial-strength Hungarian natural language processing

  • trrex

    Efficient string matching with regular expressions (by mesejo)

  • orange3-text

    🍊 :page_facing_up: Text Mining add-on for Orange3

  • Project mention: Ask HN: What Underrated Open Source Project Deserves More Recognition? | news.ycombinator.com | 2024-03-07
  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • intertext

    Detect and visualize text reuse

  • Project mention: Ask HN: Tool to find text reuse, similar paragraphs, fuzzy/near dupes in folder? | news.ycombinator.com | 2023-06-25

    Do you know of any too that I can use to compare my own notes and documents vault in search for copied paragraphs or almost similar phrases? Normal diffing/hashing wouldn't work as we're talking about the contents of slightly modified documents, and the comparison of each file against all others.

    I found the following tools that seem related yet not quite there, maybe I'm missing a particular term of art?

    https://github.com/YaleDHLab/intertext

  • llmine_core

    Your Platform for Text Mining through Configurable LLM Chains. Ideal for Developers and Semi-Technical Users

  • Project mention: Show HN: Mine Insights from text using configurable LLM Prompt Chains | news.ycombinator.com | 2023-08-25
  • Answerable

    Recommendation system for Stack Overflow unanswered questions

  • corona-ml

    Machine learning to text-mine coronavirus research for CoronaCentral.ai

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python text-mining related posts

  • [Q] Does anyone use R to code qualitative data?

    3 projects | /r/rstats | 16 Oct 2022
  • Language Input: a new web app for finding content to watch in your target language and keep track of your vocabulary

    4 projects | /r/languagelearning | 24 Dec 2021
  • France: starting January 15, the health pass will be invalid "seven months after the last injection" in the absence of a booster dose

    1 project | /r/LockdownSkepticism | 27 Nov 2021
  • rake-nltk 1.0.6 released. Comes with the flexibility to choose your own sentence and word tokenizers.

    1 project | /r/Python | 15 Sep 2021
  • rake-nltk 1.0.6 released. Comes with the flexibility to choose your own sentence and word tokenizers.

    1 project | /r/textdatamining | 15 Sep 2021
  • rake-nltk 1.0.6 released. Comes with the flexibility to choose your own sentence and word tokenizers.

    1 project | /r/LanguageTechnology | 15 Sep 2021
  • [N] UK PhD Opportunity: Text mining the impact of SARS-CoV-2 mutations from the research literature at University of Glasgow

    1 project | /r/MachineLearning | 29 Jul 2021
  • A note from our sponsor - SaaSHub
    www.saashub.com | 10 May 2024
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source text-mining projects in Python? This list will help you:

Project Stars
1 texthero 2,865
2 trafilatura 2,898
3 scattertext 2,203
4 rake-nltk 1,034
5 huspacy 148
6 trrex 134
7 orange3-text 124
8 intertext 110
9 llmine_core 31
10 Answerable 15
11 corona-ml 9

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com