Python scikit-learn

Open-source Python projects categorized as scikit-learn

Top 23 Python scikit-learn Projects

scikit-learn
  1. ailearning

    AiLearning:数据分析+机器学习实战+线性代数+PyTorch+NLTK+TF2

    Project mention: Top Github repositories for 10+ programming languages | dev.to | 2024-07-16

    Ai learning

  2. CodeRabbit

    CodeRabbit: AI Code Reviews for Developers. Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.

    CodeRabbit logo
  3. data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  4. best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: A ranked list of machine learning Python libraries. Updated weekly | news.ycombinator.com | 2025-01-31
  5. Dask

    Parallel computing with task scheduling

    Project mention: Ask HN: What's the right tool for this job? | news.ycombinator.com | 2024-07-20

    From what I've seen, there are sort of two paths. I'll provide a well known example from each.

    1. lang specific distributed task library

    For example, in Python, celery is a pretty popular task system. If you (the dev) are the one doing all the code and running the workflows, it might work well for you. You build the core code and functions, and it handles the processing and resource stuff with a little config.

    * https://github.com/celery/celery

    Or lower level:

    * https://github.com/dask/dask

    2. DAG Workflow systems

    There are also whole systems for what you're describing. They've gotten especially popular in the ML ops and data engineering world. A common one is AirFlow:

    * https://github.com/apache/airflow

  6. mlcourse.ai

    Open Machine Learning Course

  7. tpot

    A Python Automated Machine Learning tool that optimizes machine learning pipelines using genetic programming.

    Project mention: Evolve Your Machine Learning: Automate the Process of Model Selection through TPOT. | dev.to | 2024-07-06

    Resources: TPOT Documentation Genetic Programming

  8. autogluon

    Fast and Accurate ML in 3 Lines of Code

    Project mention: AIM Weekly for 04Nov2024 | dev.to | 2024-11-04

    🌐 Composed Image Retrieval 📎 Intro to Multimodal LLama 3.2 🛠️ Multi Agent Concierge 💻 RAG with Langchain Granite, Milvus 🫶 Download content ✅ Transformer Replacement? 🤖 vLLM for runing models 🌐 Amphion 📝 Autogluon 🚙 Notebook LLama like Google's Notebook LLM 🫶 Monocle2ai for tracing GenAI app code LFA&D Project 🤖 Bee Agent Framework ✅ LLama RFP Response ▶️ GenAI Script 👽 Simular AI Agent S 🦾 DrawDB with AI ✨ Ollama with LLama 3.2 Vision!!!! Preview 🚕 Powerful RAG Checker 📊 SQL Generator 💻 Role of LLMs 🐍 Document Extraction 🕶️ Open Source Vector DB Reddit 🍔 The Practical Guide to Self Hosting LLM 🦾 Stagehand Controller 🕶️ Understanding HNSWLIB 🐍 Best practices in RAG 💻 Enigma Agent 📝 Langchain, Ollama, Phi3 for Function Calling 🔋 Compass Judger 📝 Princeton NLP SimPO 🍔 Princeton NLP ProLong 🔋 Princeton NLP HELMET 🧐 Ollama Cheatsheet 🚕 Princeton NLP CopyCat 📊 Princeton NLP Shp 🕶️ Can LLM Solve Hard Github Issues 📝 Enabling Large Language Models to Generate Text with Citations 🔋 Princeton NLP CharXiv 📊 Awesome AI Agents List 🦾 Nomic’s Matryoshka text embedding model

  9. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
  10. sktime

    A unified framework for machine learning with time series

  11. auto-sklearn

    Automated Machine Learning with scikit-learn

  12. featuretools

    An open source python library for automated feature engineering

  13. flower

    Flower: A Friendly Federated AI Framework (by adap)

    Project mention: Flower 1.15.0 | news.ycombinator.com | 2025-02-03

    Do you mean this example? https://github.com/adap/flower/tree/main/examples/quickstart...

  14. orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: Hierarchical Clustering | news.ycombinator.com | 2024-04-20

    I know I've tooted its horn before, but Orange3 is a pretty neat Python-based GUI platform that makes this and a metric buttload of other statistical/ML techniques available to non-programmer types.

    Just watch out for null character `x00` in the corpus. That always seems to kill it stone dead.

    https://orangedatamining.com/

    https://orange3.readthedocs.io/projects/orange-visual-progra...

  15. yellowbrick

    Visual analysis and diagnostic tools to facilitate machine learning model selection.

  16. Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials

    A comprehensive list of Deep Learning / Artificial Intelligence and Machine Learning tutorials - rapidly expanding into areas of AI/Deep Learning / Machine Vision / NLP and industry specific areas such as Climate / Energy, Automotives, Retail, Pharma, Medicine, Healthcare, Policy, Ethics and more.

  17. scikit-llm

    Seamlessly integrate LLMs into scikit-learn.

  18. hummingbird

    Hummingbird compiles trained ML models into tensor computation for faster inference.

  19. mljar-supervised

    Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

    Project mention: Show HN: Supertree – interactive visualization of decision trees in Python | news.ycombinator.com | 2024-08-27

    We would like to keep package sustainable. Earlier, we've created package for AutoML which is MIT license (https://github.com/mljar/mljar-supervised), and it is very hard to monetise it, and you need to have funds to keep package maintained and work on it.

    Regarding purchasing, we just don't have time create landing page with buy button :) we will add it soon. The package cost will be 499 USD/yearly. We already have few finance companies interested.

  20. igel

    a delightful machine learning tool that allows you to train, test, and use models without writing code

  21. m2cgen

    Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

  22. mars

    Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

  23. PySR

    High-Performance Symbolic Regression in Python and Julia

    Project mention: Genetically synthesized supergain broadband wire-bundle antenna | news.ycombinator.com | 2024-07-31

    For those who're only distantly aware of the kind of problem this solves (like me), the wikipedia link further elaborates:

    https://en.wikipedia.org/wiki/Symbolic_regression

    and turns out there's a Python package

    https://github.com/MilesCranmer/PySR

    I've needed something like this at least once (but IIRC no more than twice ;) ), so I'm glad to know what to look for next time, thanks for the rabbit hole!

  24. MachineLearningStocks

    Using python and scikit-learn to make stock predictions

  25. SaaSHub

    SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

    SaaSHub logo
NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020).

Python scikit-learn discussion

Log in or Post with

Python scikit-learn related posts

  • Learn Machine Learning with these GitHub repositories

    5 projects | news.ycombinator.com | 15 Jan 2025
  • AlphaPy: machine learning framework built on sklearn and pandas. Support pyfolio/xgboost/lightgmb/catboost(gradient boosting on decision tress) etc. Examples include financial market prediction/sports prediction/kaggle. Configurations are set though

    1 project | /r/algoprojects | 10 Dec 2023
  • Tradero: A tool for achieving self-funding via trading

    1 project | news.ycombinator.com | 12 Sep 2023
  • Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0

    1 project | /r/algoprojects | 28 Aug 2023
  • Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0

    1 project | /r/algoprojects | 27 Aug 2023
  • Hyperactive Version 4.5 Released

    1 project | news.ycombinator.com | 27 Aug 2023
  • Scikit-learn Stock Prediction: using fundamental and pricing data to predict future stock returns. Sklearn's randomforest classifier is trainded and author claimed positive live trading results. Not actively mainained Other Models - star count:1520.0

    1 project | /r/algoprojects | 26 Aug 2023
  • A note from our sponsor - SaaSHub
    www.saashub.com | 9 Feb 2025
    SaaSHub helps you find the best software and product alternatives Learn more →

Index

What are some of the best open-source scikit-learn projects in Python? This list will help you:

# Project Stars
1 ailearning 40,001
2 data-science-ipython-notebooks 27,812
3 best-of-ml-python 19,552
4 Dask 12,909
5 mlcourse.ai 9,899
6 tpot 9,834
7 autogluon 8,345
8 sktime 8,174
9 auto-sklearn 7,718
10 featuretools 7,359
11 flower 5,393
12 orange 4,967
13 yellowbrick 4,312
14 Artificial-Intelligence-Deep-Learning-Machine-Learning-Tutorials 3,817
15 scikit-llm 3,413
16 hummingbird 3,381
17 mljar-supervised 3,106
18 igel 3,096
19 m2cgen 2,843
20 mars 2,712
21 PySR 2,592
22 modAL 2,254
23 MachineLearningStocks 1,804

Sponsored
CodeRabbit: AI Code Reviews for Developers
Revolutionize your code reviews with AI. CodeRabbit offers PR summaries, code walkthroughs, 1-click suggestions, and AST-based analysis. Boost productivity and code quality across all major languages with each PR.
coderabbit.ai

Did you know that Python is
the 2nd most popular programming language
based on number of references?