SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Data Mining Projects
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.Project mention: Tutorials on creating primitive ML algorithms from scratch? | reddit.com/r/learnmachinelearning | 2023-01-24
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.Project mention: I made a website for a friend who owns a restaurant. He's wondering if there's a way to upload a picture of his menu daily. What is the best way to do this? | reddit.com/r/learnprogramming | 2023-01-15
Write Clean Python Code. Always.. Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
Topic Modelling for HumansProject mention: Understanding How Dynamic node2vec Works on Streaming Data | dev.to | 2022-12-23
This is our optimization problem. Now, we hope that you have an idea of what our goal is. Luckily for us, this is already implemented in a Python module called gensim. Yes, these guys are brilliant in natural language processing and we will make use of it. 🤝
Anomaly detection related books, papers, videos, and toolboxesProject mention: anomaly-detection-resources: NEW Extended Research - star count:6556.0 | reddit.com/r/algoprojects | 2022-11-15
A Comprehensive and Scalable Python Library for Outlier Detection (Anomaly Detection)Project mention: Pyod – A Comprehensive and Scalable Python Library for Outlier Detection | news.ycombinator.com | 2022-08-10
A unified framework for machine learning with time seriesProject mention: Does anyone know a trusted Python package for applying Croston's Time series method? | reddit.com/r/pythontips | 2022-12-04
I initially used the SkTime's Croston class SKTime Croston but when I try to get the fitted values using the steps in the discussion on github, the values are the same, a straight line throughout the in-sample to ou-of-sample predictions.
🍊 :bar_chart: :bulb: Orange: Interactive data analysisProject mention: Statistical Analysis software based on Python? | reddit.com/r/Python | 2023-01-28
Only thing I can think of is Orange, which has some statistics capability, but isn't its focus.
Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Platform where developers build real-time applications for analytics, IoT and cloud-native services. Easy to start, it is available in the cloud or on-premises.
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
Extract structured data from PDF invoicesProject mention: Utilize OpenAI API to extract information from PDF files | dev.to | 2023-01-28
Using regex: to match patterns in text after converting the PDF to plain text. Examples include invoice2data and traprange-invoice. However, this method requires knowledge of the format of the data fields.
Multi-class confusion matrix library in PythonProject mention: PyCM 3.8 Released: Distance/Similarity Support | news.ycombinator.com | 2023-02-02
A curated list of data mining papers about fraud detection.Project mention: awesome-fraud-detection-papers: NEW Extended Research - star count:1195.0 | reddit.com/r/algoprojects | 2022-10-08
Pytorch Repo for DeepGCNs (ICCV'2019 Oral, TPAMI'2021), DeeperGCN (arXiv'2020) and GNN1000(ICML'2021): https://www.deepgcns.org
NFStream: a Flexible Network Data Analysis Framework.Project mention: Monitor your system network traffic using one line of Python | news.ycombinator.com | 2022-09-28
Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python.Project mention: GitHub - JosephLai241/URS: Universal Reddit Scraper - A comprehensive Reddit scraping command-line tool written in Python. | reddit.com/r/YoutubeFactory | 2022-05-27
Powerful and flexible Instagram scraping library for Python, providing easy-to-use and expressive tools for accessing data programmatically
Public repository for the PM4Py (Process Mining for Python) project.Project mention: Process Analytics - January 2023 News | dev.to | 2023-02-01
It's a new year 🎊 and what better way to kick off 2023 than with some Process Analytics news! January brought us exciting developments in the world of bpmn-visualization and pm4py integration 🔗. With our team working hard to connect the dots, we’re making bpmn-visualization more accessible and easier to integrate with the Process Mining ecosystem.
UnityPy is python module that makes it possible to extract/unpack and edit Unity assetsProject mention: Show HN: Unblob – extraction suite for 30+ file formats | news.ycombinator.com | 2023-01-18
Since you're the author and I see the tool is in Python. I'm the original author of UnityPack (https://github.com/hearthsim/unitypack - nowadays, the fork UnityPy is more powerful and maintained: https://github.com/K0lb3/UnityPy).
It's in Python and is able to deserialize Unity archives, treating them as a serialization format rather than a simple archive format. Feel free to email me if you want to integrate something like this or you have questions :)
Official Implement of "ADBench: Anomaly Detection Benchmark".Project mention: ADBench: Anomaly Detection Benchmark | news.ycombinator.com | 2022-06-30
AIL framework - Analysis Information Leak framework
A Python 3 library making time series data mining tasks, utilizing matrix profile algorithms, accessible to everyone.Project mention: matrixprofile: NEW Data - star count:258.0 | reddit.com/r/algoprojects | 2022-05-21
Python library for reading and writing well data using Log ASCII Standard (LAS) filesProject mention: Any geolog users here? Managing log data | reddit.com/r/geologycareers | 2022-02-10
I might try to merge the .las files using lasio (https://github.com/kinverarity1/lasio).
Send Sir Perceval on a quest to retrieve and gather data from software repositories.
A python toolbox / library for data mining on partially-observed time series, supporting tasks of forecasting / imputation / classification / clustering on incomplete (irregularly-sampled) multivariate time series with missing values.Project mention: PyPOTS: NEW Data - star count:182.0 | reddit.com/r/algoprojects | 2023-01-14
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Data Mining related posts
Statistical Analysis software based on Python?
1 project | reddit.com/r/Python | 28 Jan 2023
Utilize OpenAI API to extract information from PDF files
2 projects | dev.to | 28 Jan 2023
Show HN: Open-Source No-Code Platform for Machine Learning and Data Science
2 projects | news.ycombinator.com | 1 Jan 2023
Resources for data visualization (free & paid) for scientific publications
2 projects | reddit.com/r/datascience | 17 Nov 2022
Monitor your system network traffic using one line of Python
1 project | news.ycombinator.com | 28 Sep 2022
Machine learning, concluded: Did the “no-code” tools beat manual analysis?
1 project | news.ycombinator.com | 17 Aug 2022
Clustering and Heat map software (mac)
1 project | reddit.com/r/bioinformatics | 1 Jul 2022
A note from our sponsor - #<SponsorshipServiceOld:0x00007fea596dcab8>
www.saashub.com | 3 Feb 2023
What are some of the best open-source Data Mining projects in Python? This list will help you: