SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Data Mining Projects
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
https://github.com/JaidedAI/EasyOCR
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
-
-
pyod
A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques
-
Project mention: anomaly-detection-resources: NEW Extended Research - star count:7507.0 | /r/algoprojects | 2023-10-24
-
catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05 -
-
I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...
-
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
-
-
-
-
CleverCSV
CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.
-
deep_gcns_torch
Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org
-
-
PyPOTS
A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values
-
-
-
-
RD-Agent
Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automate these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which let AI drive data-driven AI.
Project mention: RD-Agent: LLM-based autonomous evolving agents for industrial data-driven R&D | news.ycombinator.com | 2024-09-25 -
-
First time coming across this, looks very cool! Definitely some ideas there that I'd like to implement for osintbuddy. Another project I'm going to be taking some ideas from is: https://github.com/ail-project/ail-framework - a modular framework to analyse potential information leaks
-
matrixprofile
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.
Python Data Mining discussion
Python Data Mining related posts
-
Hierarchical Clustering
-
Orange Data Mining
-
The Graph of Wikipedia [video]
-
Taxonomy Management?
-
Orange: Open-source machine learning and data visualization
-
Aeon: A unified framework for machine learning with time series
-
What exactly is AutoGPT?
-
A note from our sponsor - SaaSHub
www.saashub.com | 9 Oct 2024
Index
What are some of the best open-source Data Mining projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | EasyOCR | 23,959 |
2 | ML-From-Scratch | 23,876 |
3 | gensim | 15,591 |
4 | pyod | 8,478 |
5 | anomaly-detection-resources | 8,273 |
6 | catboost | 8,025 |
7 | sktime | 7,784 |
8 | orange | 4,816 |
9 | pdftabextract | 2,208 |
10 | invoice2data | 1,811 |
11 | awesome-fraud-detection-papers | 1,606 |
12 | pycm | 1,445 |
13 | CleverCSV | 1,254 |
14 | deep_gcns_torch | 1,133 |
15 | nfstream | 1,075 |
16 | PyPOTS | 1,005 |
17 | aeon | 979 |
18 | ADBench | 824 |
19 | UnityPy | 809 |
20 | RD-Agent | 756 |
21 | pm4py-core | 708 |
22 | ail-framework | 574 |
23 | matrixprofile | 361 |