SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Data Mining Projects
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://github.com/JaidedAI/EasyOCR
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
-
-
pyod
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
-
-
catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05 -
-
I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...
-
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
-
-
-
-
CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
-
RD-Agent
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which lets AI drive data-driven AI.
Project mention: RD-Agent: LLM-based autonomous evolving agents for industrial data-driven R&D | news.ycombinator.com | 2024-09-25 -
deep_gcns_torch
Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org
-
PyPOTS
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values
-
-
-
-
-
-
-
matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
Python Data Mining discussion
Python Data Mining related posts
-
Hierarchical Clustering
-
Orange Data Mining
-
The Graph of Wikipedia [video]
-
Taxonomy Management?
-
Orange: Open-source machine learning and data visualization
-
Aeon: A unified framework for machine learning with time series
-
What exactly is AutoGPT?
-
A note from our sponsor - SaaSHub
www.saashub.com | 3 Dec 2024
Index
What are some of the best open-source Data Mining projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | EasyOCR | 24,697 |
2 | ML-From-Scratch | 23,893 |
3 | gensim | 15,705 |
4 | pyod | 8,611 |
5 | anomaly-detection-resources | 8,408 |
6 | catboost | 8,112 |
7 | sktime | 7,983 |
8 | orange | 4,892 |
9 | pdftabextract | 2,208 |
10 | invoice2data | 1,849 |
11 | awesome-fraud-detection-papers | 1,633 |
12 | pycm | 1,455 |
13 | CleverCSV | 1,267 |
14 | RD-Agent | 1,183 |
15 | deep_gcns_torch | 1,135 |
16 | PyPOTS | 1,119 |
17 | nfstream | 1,089 |
18 | aeon | 1,024 |
19 | ADBench | 868 |
20 | UnityPy | 871 |
21 | pm4py-core | 741 |
22 | ail-framework | 620 |
23 | matrixprofile | 362 |