Python Data Mining

Open-source Python projects categorized as Data Mining

Top 23 Python Data Mining Projects

Data Mining
  • EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    Project mention: Decoding OCR: A Comprehensive Guide | dev.to | 2024-08-07

    https://github.com/JaidedAI/EasyOCR

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • ML-From-Scratch

    Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

  • gensim

    Topic Modelling for Humans

  • pyod

    A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques

  • anomaly-detection-resources

    Anomaly detection related books, papers, videos, and toolboxes

  • catboost

    A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

    Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05
  • sktime

    A unified framework for machine learning with time series

  • orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

    I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.

    Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.

    https://orangedatamining.com/

    https://orange3.readthedocs.io/projects/orange-visual-progra...

  • pdftabextract

    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

  • invoice2data

    Extract structured data from PDF invoices

  • awesome-fraud-detection-papers

    A curated list of data mining papers about fraud detection.

  • pycm

    Multi-class confusion matrix library in Python

  • CleverCSV

    CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

  • RD-Agent

    Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automating these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which lets AI drive data-driven AI.

    Project mention: RD-Agent: LLM-based autonomous evolving agents for industrial data-driven R&D | news.ycombinator.com | 2024-09-25
  • deep_gcns_torch

    Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org

  • PyPOTS

    A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values

  • nfstream

    NFStream: a Flexible Network Data Analysis Framework.

  • aeon

    A toolkit for machine learning from time series

    Project mention: FLaNK 15 Jan 2024 | dev.to | 2024-01-15
  • ADBench

    Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.

  • UnityPy

    UnityPy is python module that makes it possible to extract/unpack and edit Unity assets

  • pm4py-core

    Public repository for the PM4Py (Process Mining for Python) project.

  • ail-framework

    AIL framework - Analysis Information Leak framework

  • matrixprofile

    A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data Mining discussion

Log in or Post with

Python Data Mining related posts

Index

What are some of the best open-source Data Mining projects in Python? This list will help you:

Project Stars
1 EasyOCR 24,697
2 ML-From-Scratch 23,893
3 gensim 15,705
4 pyod 8,611
5 anomaly-detection-resources 8,408
6 catboost 8,112
7 sktime 7,983
8 orange 4,892
9 pdftabextract 2,208
10 invoice2data 1,849
11 awesome-fraud-detection-papers 1,633
12 pycm 1,455
13 CleverCSV 1,267
14 RD-Agent 1,183
15 deep_gcns_torch 1,135
16 PyPOTS 1,119
17 nfstream 1,089
18 aeon 1,024
19 ADBench 868
20 UnityPy 871
21 pm4py-core 741
22 ail-framework 620
23 matrixprofile 362

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that Python is
the 1st most popular programming language
based on number of metions?