SaaSHub helps you find the best software and product alternatives Learn more β
Top 23 Data Mining Open-Source Projects
-
ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
-
awesome-datascience
:memo: An awesome Data Science repository to learn and apply for real world problems.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
-
LightGBM
A fast, distributed, high performance gradient boosting (GBT, GBDT, GBRT, GBM or MART) framework based on decision tree algorithms, used for ranking, classification and many other machine learning tasks.
-
awesome-production-machine-learning
A curated list of awesome open source libraries to deploy, monitor, version and scale your machine learning
-
python-machine-learning-book
The "Python Machine Learning (1st edition)" book code repository and info resource
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
catboost
A fast, scalable, high performance Gradient Boosting on Decision Trees library, used for ranking, classification, regression and other machine learning tasks for Python, R, Java, C++. Supports computation on CPU and GPU.
-
pdftabextract
A set of tools for extracting tables from PDF files helping to do data mining on (OCR-processed) scanned documents.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: About Data analyst, data scientist and data engineer, resources and experiences | dev.to | 2024-03-26Awesome Data Science by Academic
Project mention: Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide | dev.to | 2023-12-27PyTesseract Module [ Github ] EasyOCR Module [ Github ] PaddlePaddle OCR [ Github ]
Project mention: SIRUS.jl: Interpretable Machine Learning via Rule Extraction | /r/Julia | 2023-06-29SIRUS.jl is a pure Julia implementation of the SIRUS algorithm by BΓ©nard et al. (2021). The algorithm is a rule-based machine learning model meaning that it is fully interpretable. The algorithm does this by firstly fitting a random forests and then converting this forest to rules. Furthermore, the algorithm is stable and achieves a predictive performance that is comparable to LightGBM, a state-of-the-art gradient boosting model created by Microsoft. Interpretability, stability, and predictive performance are described in more detail below.
Project mention: Exploring Open-Source Alternatives to Landing AI for Robust MLOps | dev.to | 2023-12-13One trove of treasures is the awesome-production-machine-learning repository on GitHub. This curated list provides a multitude of frameworks, libraries, and software designed to facilitate various stages of the ML lifecycle.
Project mention: A Comprehensive Guide for Building Rag-Based LLM Applications | news.ycombinator.com | 2023-09-13This is a feature in many commercial products already, as well as open source libraries like PyOD. https://github.com/yzhao062/pyod
Project mention: anomaly-detection-resources: NEW Extended Research - star count:7507.0 | /r/algoprojects | 2023-10-24
Project mention: CatBoost: Open-source gradient boosting library | news.ycombinator.com | 2024-03-05
I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...
Project mention: awesome-TS-anomaly-detection: NEW Data - star count:2694.0 | /r/algoprojects | 2023-11-21
Project mention: Digitized Continuous Magnetic Recordings for the 1859 Carrington Event | news.ycombinator.com | 2024-04-23Something similar which is more recently-maintained: https://github.com/automeris-io/WebPlotDigitizer
Project mention: Show HN: Open-source, browser-local data exploration using DuckDB-WASM and PRQL | news.ycombinator.com | 2024-03-15[2] https://github.com/Kanaries/graphic-walker/issues/330
Data Mining related posts
- Digitized Continuous Magnetic Recordings for the 1859 Carrington Event
- Hierarchical Clustering
- Orange Data Mining
- The Graph of Wikipedia [video]
- Taxonomy Management?
- awesome-TS-anomaly-detection: NEW Data - star count:2694.0
- awesome-TS-anomaly-detection: NEW Data - star count:2694.0
-
A note from our sponsor - SaaSHub
www.saashub.com | 25 Apr 2024
Index
What are some of the best open-source Data Mining projects? This list will help you:
Project | Stars | |
---|---|---|
1 | ML-From-Scratch | 23,164 |
2 | awesome-datascience | 23,101 |
3 | EasyOCR | 21,882 |
4 | LightGBM | 16,043 |
5 | awesome-production-machine-learning | 15,947 |
6 | gensim | 15,236 |
7 | python-machine-learning-book | 12,076 |
8 | pyod | 7,941 |
9 | anomaly-detection-resources | 7,858 |
10 | catboost | 7,744 |
11 | sktime | 7,404 |
12 | awesome-ml-for-cybersecurity | 6,769 |
13 | Ferret | 5,616 |
14 | orange | 4,604 |
15 | datascience | 4,071 |
16 | textract | 3,778 |
17 | kaggle-solutions | 3,745 |
18 | awesome-TS-anomaly-detection | 2,811 |
19 | WebPlotDigitizer | 2,496 |
20 | bolt | 2,463 |
21 | graphic-walker | 2,223 |
22 | pdftabextract | 2,152 |
23 | invoice2data | 1,694 |
Sponsored