Python Data Mining

Open-source Python projects categorized as Data Mining

Top 23 Python Data Mining Projects

Data Mining
  • EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    Project mention: Decoding OCR: A Comprehensive Guide | dev.to | 2024-08-07

    https://github.com/JaidedAI/EasyOCR

  • SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  • ML-From-Scratch

    Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

  • gensim

    Topic Modelling for Humans

  • pyod

    A Python Library for Outlier and Anomaly Detection, Integrating Classical and Deep Learning Techniques

  • anomaly-detection-resources

    Anomaly detection related books, papers, videos, and toolboxes

    Project mention: anomaly-detection-resources: NEW Extended Research - star count:7507.0 | /r/algoprojects | 2023-10-24
  • catboost

    A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.

    Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05
  • sktime

    A unified framework for machine learning with time series

  • orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

    I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.

    Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.

    https://orangedatamining.com/

    https://orange3.readthedocs.io/projects/orange-visual-progra...

  • pdftabextract

    A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.

  • invoice2data

    Extract structured data from PDF invoices

  • awesome-fraud-detection-papers

    A curated list of data mining papers about fraud detection.

  • pycm

    Multi-class confusion matrix library in Python

  • CleverCSV

    CleverCSV is a Python package for handling messy CSV files. It provides a drop-in replacement for the builtin CSV module with improved dialect detection, and comes with a handy command line application for working with CSV files.

  • deep_gcns_torch

    Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org

  • nfstream

    NFStream: a Flexible Network Data Analysis Framework.

  • PyPOTS

    A Python toolkit/library for reality-centric machine/deep learning and data mining on partially-observed time series, including SOTA neural network models for scientific analysis tasks of imputation/classification/clustering/forecasting/anomaly detection/cleaning on incomplete industrial (irregularly-sampled) multivariate TS with NaN missing values

  • aeon

    A toolkit for machine learning from time series

    Project mention: FLaNK 15 Jan 2024 | dev.to | 2024-01-15
  • ADBench

    Official Implement of "ADBench: Anomaly Detection Benchmark", NeurIPS 2022.

  • UnityPy

    UnityPy is python module that makes it possible to extract/unpack and edit Unity assets

  • RD-Agent

    Research and development (R&D) is crucial for the enhancement of industrial productivity, especially in the AI era, where the core aspects of R&D are mainly focused on data and models. We are committed to automate these high-value generic R&D processes through our open source R&D automation tool RD-Agent, which let AI drive data-driven AI.

    Project mention: RD-Agent: LLM-based autonomous evolving agents for industrial data-driven R&D | news.ycombinator.com | 2024-09-25
  • pm4py-core

    Public repository for the PM4Py (Process Mining for Python) project.

  • ail-framework

    AIL framework - Analysis Information Leak framework

    Project mention: Ask HN: Show me your half baked project | news.ycombinator.com | 2023-10-12

    First time coming across this, looks very cool! Definitely some ideas there that I'd like to implement for osintbuddy. Another project I'm going to be taking some ideas from is: https://github.com/ail-project/ail-framework - a modular framework to analyse potential information leaks

  • matrixprofile

    A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python Data Mining discussion

Log in or Post with

Python Data Mining related posts

Index

What are some of the best open-source Data Mining projects in Python? This list will help you:

Project Stars
1 EasyOCR 23,959
2 ML-From-Scratch 23,876
3 gensim 15,591
4 pyod 8,478
5 anomaly-detection-resources 8,273
6 catboost 8,025
7 sktime 7,784
8 orange 4,816
9 pdftabextract 2,208
10 invoice2data 1,811
11 awesome-fraud-detection-papers 1,606
12 pycm 1,445
13 CleverCSV 1,254
14 deep_gcns_torch 1,133
15 nfstream 1,075
16 PyPOTS 1,005
17 aeon 979
18 ADBench 824
19 UnityPy 809
20 RD-Agent 756
21 pm4py-core 708
22 ail-framework 574
23 matrixprofile 361

Sponsored
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com

Did you konow that Python is
the 1st most popular programming language
based on number of metions?