SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Data Mining Projects
-
ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
Project mention: Tutorials on creating primitive ML algorithms from scratch? | reddit.com/r/learnmachinelearning | 2023-01-24ml-from-scratch
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Project mention: I made a website for a friend who owns a restaurant. He's wondering if there's a way to upload a picture of his menu daily. What is the best way to do this? | reddit.com/r/learnprogramming | 2023-01-15 -
Sonar
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
-
This is our optimization problem. Now, we hope that you have an idea of what our goal is. Luckily for us, this is already implemented in a Python module called gensim. Yes, these guys are brilliant in natural language processing and we will make use of it. 🤝
-
Project mention: anomaly-detection-resources: NEW Extended Research - star count:6556.0 | reddit.com/r/algoprojects | 2022-11-15
-
Project mention: Pyod – A Comprehensive and Scalable Python Library for Outlier Detection | news.ycombinator.com | 2022-08-10
-
Project mention: Does anyone know a trusted Python package for applying Croston's Time series method? | reddit.com/r/pythontips | 2022-12-04
I initially used the SkTime's Croston class SKTime Croston but when I try to get the fitted values using the steps in the discussion on github, the values are the same, a straight line throughout the in-sample to ou-of-sample predictions.
-
Only thing I can think of is Orange, which has some statistics capability, but isn't its focus.
-
InfluxDB
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
-
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
-
Using regex: to match patterns in text after converting the PDF to plain text. Examples include invoice2data and traprange-invoice. However, this method requires knowledge of the format of the data fields.
-
-
Project mention: awesome-fraud-detection-papers: NEW Extended Research - star count:1195.0 | reddit.com/r/algoprojects | 2022-10-08
-
deep_gcns_torch
Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org
-
Project mention: Monitor your system network traffic using one line of Python | news.ycombinator.com | 2022-09-28
-
Project mention: GitHub - JosephLai241/URS: Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python. | reddit.com/r/YoutubeFactory | 2022-05-27
-
instascrape
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
-
It's a new year 🎊 and what better way to kick off 2023 than with some Process Analytics news! January brought us exciting developments in the world of bpmn-visualization and pm4py integration 🔗. With our team working hard to connect the dots, we’re making bpmn-visualization more accessible and easier to integrate with the Process Mining ecosystem.
-
Project mention: Show HN: Unblob – extraction suite for 30+ file formats | news.ycombinator.com | 2023-01-18
Since you're the author and I see the tool is in Python. I'm the original author of UnityPack (https://github.com/hearthsim/unitypack - nowadays, the fork UnityPy is more powerful and maintained: https://github.com/K0lb3/UnityPy).
It's in Python and is able to deserialize Unity archives, treating them as a serialization format rather than a simple archive format. Feel free to email me if you want to integrate something like this or you have questions :)
-
-
-
matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
Project mention: matrixprofile: NEW Data - star count:258.0 | reddit.com/r/algoprojects | 2022-05-21 -
Project mention: Any geolog users here? Managing log data | reddit.com/r/geologycareers | 2022-02-10
I might try to merge the .las files using lasio (https://github.com/kinverarity1/lasio).
-
grimoirelab-perceval
Send Sir Perceval on a quest to retrieve and gather data from software repositories.
-
PyPOTS
A python toolbox / library for data mining on partially-observed time series, supporting tasks of forecasting / imputation / classification / clustering on incomplete (irregularly-sampled) multivariate time series with missing values.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Data Mining related posts
- Statistical Analysis software based on Python?
- Utilize OpenAI API to extract information from PDF files
- Show HN: Open-Source No-Code Platform for Machine Learning and Data Science
- Resources for data visualization (free & paid) for scientific publications
- Monitor your system network traffic using one line of Python
- Machine learning, concluded: Did the “no-code” tools beat manual analysis?
- Clustering and Heat map software (mac)
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007fea596dcab8>
www.saashub.com | 3 Feb 2023
Index
What are some of the best open-source Data Mining projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | ML-From-Scratch | 21,894 |
2 | EasyOCR | 16,848 |
3 | gensim | 13,910 |
4 | anomaly-detection-resources | 6,807 |
5 | pyod | 6,677 |
6 | sktime | 6,077 |
7 | orange | 3,919 |
8 | pdftabextract | 2,037 |
9 | invoice2data | 1,362 |
10 | pycm | 1,347 |
11 | awesome-fraud-detection-papers | 1,275 |
12 | deep_gcns_torch | 1,002 |
13 | nfstream | 903 |
14 | URS | 550 |
15 | instascrape | 527 |
16 | pm4py-core | 514 |
17 | UnityPy | 487 |
18 | ADBench | 485 |
19 | ail-framework | 344 |
20 | matrixprofile | 303 |
21 | lasio | 301 |
22 | grimoirelab-perceval | 266 |
23 | PyPOTS | 198 |