SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python scikit-learn Projects
-
Ai learning
-
CodeRabbit
CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
-
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
Project mention: A ranked list of machine learning Python libraries. Updated weekly | news.ycombinator.com | 2025-01-31
-
From what I've seen, there are sort of two paths. I'll provide a well known example from each.
1. lang specific distributed task library
For example, in Python, celery is a pretty popular task system. If you (the dev) are the one doing all the code and running the workflows, it might work well for you. You build the core code and functions, and it handles the processing and resource stuff with a little config.
* https://github.com/celery/celery
Or lower level:
* https://github.com/dask/dask
2. DAG Workflow systems
There are also whole systems for what you're describing. They've gotten especially popular in the ML ops and data engineering world. A common one is AirFlow:
* https://github.com/apache/airflow
-
-
tpot
A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.
Project mention: Evolve Your Machine Learning: Automate the Process of Model Selection through TPOT. | dev.to | 2024-07-06Resources: TPOT Documentation Genetic Programming
-
🌐 Composed Image Retrieval 📎 Intro to Multimodal LLama 3.2 🛠️ Multi Agent Concierge 💻 RAG with Langchain Granite, Milvus 🫶 Download content ✅ Transformer Replacement? 🤖 vLLM for runing models 🌐 Amphion 📝 Autogluon 🚙 Notebook LLama like Google's Notebook LLM 🫶 Monocle2ai for tracing GenAI app code LFA&D Project 🤖 Bee Agent Framework ✅ LLama RFP Response ▶️ GenAI Script 👽 Simular AI Agent S 🦾 DrawDB with AI ✨ Ollama with LLama 3.2 Vision!!!! Preview 🚕 Powerful RAG Checker 📊 SQL Generator 💻 Role of LLMs 🐍 Document Extraction 🕶️ Open Source Vector DB Reddit 🍔 The Practical Guide to Self Hosting LLM 🦾 Stagehand Controller 🕶️ Understanding HNSWLIB 🐍 Best practices in RAG 💻 Enigma Agent 📝 Langchain, Ollama, Phi3 for Function Calling 🔋 Compass Judger 📝 Princeton NLP SimPO 🍔 Princeton NLP ProLong 🔋 Princeton NLP HELMET 🧐 Ollama Cheatsheet 🚕 Princeton NLP CopyCat 📊 Princeton NLP Shp 🕶️ Can LLM Solve Hard Github Issues 📝 Enabling Large Language Models to Generate Text with Citations 🔋 Princeton NLP CharXiv 📊 Awesome AI Agents List 🦾 Nomic’s Matryoshka text embedding model
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
-
Do you mean this example? https://github.com/adap/flower/tree/main/examples/quickstart...
-
I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.
Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.
https://orangedatamining.com/
https://orange3.readthedocs.io/projects/orange-visual-progra...
-
-
Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials
A comprehensive list of Deep Learning / Artificial Intelligence and Machine Learning tutorials - rapidly expanding into areas of AI/Deep Learning / Machine Vision / NLP and industry specific areas such as Climate / Energy, Automotives, Retail, Pharma, Medicine, Healthcare, Policy, Ethics and more.
-
-
-
mljar-supervised
Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation
Project mention: Show HN: Supertree – interactive visualization of decision trees in Python | news.ycombinator.com | 2024-08-27We would like to keep package sustainable. Earlier, we've created package for AutoML which is MIT license (https://github.com/mljar/mljar-supervised), and it is very hard to monetise it, and you need to have funds to keep package maintained and work on it.
Regarding purchasing, we just don't have time create landing page with buy button :) we will add it soon. The package cost will be 499 USD/yearly. We already have few finance companies interested.
-
igel
a delightful machine learning tool that allows you to train, test, and use models without writing code
-
m2cgen
Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies
-
mars
Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.
-
Project mention: Genetically synthesized supergain broadband wire-bundle antenna | news.ycombinator.com | 2024-07-31
For those who're only distantly aware of the kind of problem this solves (like me), the wikipedia link further elaborates:
https://en.wikipedia.org/wiki/Symbolic_regression
and turns out there's a Python package
https://github.com/MilesCranmer/PySR
I've needed something like this at least once (but IIRC no more than twice ;) ), so I'm glad to know what to look for next time, thanks for the rabbit hole!
-
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python scikit-learn discussion
Python scikit-learn related posts
-
Learn Machine Learning with these GitHub repositories
-
AlphaPy: machine learning framework built on sklearn and pandas. Support pyfolio/xgboost/lightgmb/catboost(gradient boosting on decision tress) etc. Examples include financial market prediction/sports prediction/kaggle. Configurations are set though
-
Tradero: A tool for achieving self-funding via trading
-
Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0
-
Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0
-
Hyperactive Version 4.5 Released
-
Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0
-
A note from our sponsor - SaaSHub
www.saashub.com | 9 Feb 2025
Index
What are some of the best open-source scikit-learn projects in Python? This list will help you:
# | Project | Stars |
---|---|---|
1 | ailearning | 40,001 |
2 | data-science-ipython-notebooks | 27,812 |
3 | best-of-ml-python | 19,552 |
4 | Dask | 12,909 |
5 | mlcourse.ai | 9,899 |
6 | tpot | 9,834 |
7 | autogluon | 8,345 |
8 | sktime | 8,174 |
9 | auto-sklearn | 7,718 |
10 | featuretools | 7,359 |
11 | flower | 5,393 |
12 | orange | 4,967 |
13 | yellowbrick | 4,312 |
14 | Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials | 3,817 |
15 | scikit-llm | 3,413 |
16 | hummingbird | 3,381 |
17 | mljar-supervised | 3,106 |
18 | igel | 3,096 |
19 | m2cgen | 2,843 |
20 | mars | 2,712 |
21 | PySR | 2,592 |
22 | modAL | 2,254 |
23 | MachineLearningStocks | 1,804 |