SaaSHub helps you find the best software and product alternatives Learn more →
Top 11 Python text-mining Projects
-
trafilatura
Python & command-line tool to gather text on the Web: web crawling/scraping, extraction of text, metadata, comments
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
llmine_core
Your Platform for Text Mining through Configurable LLM Chains. Ideal for Developers and Semi-Technical Users
Project mention: Trafilatura: Python tool to gather text on the Web | news.ycombinator.com | 2023-08-14The feature list answers that question pretty well: https://github.com/adbar/trafilatura#features
Basically: you could implement all of this on top of BeautifulSoup - polite crawling policies, sitemap and feed parsing, URL de-duplication, parallel processing, download queues, heuristics for extracting just the main article content, metadata extraction, language detection... but it would require writing an enormous amount of extra code.
Project mention: Ask HN: What Underrated Open Source Project Deserves More Recognition? | news.ycombinator.com | 2024-03-07
Project mention: Ask HN: Tool to find text reuse, similar paragraphs, fuzzy/near dupes in folder? | news.ycombinator.com | 2023-06-25Do you know of any too that I can use to compare my own notes and documents vault in search for copied paragraphs or almost similar phrases? Normal diffing/hashing wouldn't work as we're talking about the contents of slightly modified documents, and the comparison of each file against all others.
I found the following tools that seem related yet not quite there, maybe I'm missing a particular term of art?
https://github.com/YaleDHLab/intertext
Project mention: Show HN: Mine Insights from text using configurable LLM Prompt Chains | news.ycombinator.com | 2023-08-25
Python text-mining related posts
-
[Q] Does anyone use R to code qualitative data?
-
Language Input: a new web app for finding content to watch in your target language and keep track of your vocabulary
-
France: starting January 15, the health pass will be invalid "seven months after the last injection" in the absence of a booster dose
-
rake-nltk 1.0.6 released. Comes with the flexibility to choose your own sentence and word tokenizers.
-
rake-nltk 1.0.6 released. Comes with the flexibility to choose your own sentence and word tokenizers.
-
rake-nltk 1.0.6 released. Comes with the flexibility to choose your own sentence and word tokenizers.
-
[N] UK PhD Opportunity: Text mining the impact of SARS-CoV-2 mutations from the research literature at University of Glasgow
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024
Index
What are some of the best open-source text-mining projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | texthero | 2,865 |
2 | trafilatura | 2,898 |
3 | scattertext | 2,203 |
4 | rake-nltk | 1,034 |
5 | huspacy | 148 |
6 | trrex | 134 |
7 | orange3-text | 124 |
8 | intertext | 110 |
9 | llmine_core | 31 |
10 | Answerable | 15 |
11 | corona-ml | 9 |
Sponsored