Python scikit-learn

Open-source Python projects categorized as scikit-learn | Edit details

Top 23 Python scikit-learn Projects

  • GitHub repo data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • GitHub repo Dask

    Parallel computing with task scheduling

    Project mention: What does it mean to scale your python powered pipeline? | dev.to | 2022-01-03

    Dask: Distributed data frames, machine learning and more

  • SonarQube

    Static code analysis for 29 languages.. Your projects are multi-language. So is SonarQube analysis. Find Bugs, Vulnerabilities, Security Hotspots, and Code Smells so you can release quality code every time. Get started analyzing your projects today for free.

  • GitHub repo best-of-ml-python

    🏆 A ranked list of awesome machine learning Python libraries. Updated weekly.

    Project mention: Awesome list of ML | reddit.com/r/programming | 2021-09-16
  • GitHub repo auto-sklearn

    Automated Machine Learning with scikit-learn

    Project mention: Why not AutoML every tabular data? | reddit.com/r/datascience | 2021-07-26

    Efficiency Ignoring the feature engineering aspects aside, a typical data scientist workflow involves trying out the different models. Some of the AutoML modules like H2O AutoML, AutoSklearn does this for you, and allow you to interpret your models. All these save so much time experimenting with the standard models.

  • GitHub repo sktime

    A unified framework for machine learning with time series

    Project mention: Good python time series libraries? | reddit.com/r/algotrading | 2021-12-13

    SKTime

  • GitHub repo autogluon

    AutoGluon: AutoML for Text, Image, and Tabular Data

    Project mention: What will the data science job market be like in 5 years? | reddit.com/r/datascience | 2021-08-14

    Some AutoML is getting pretty good, AutoGluon is very solid for tabular data. That being said you still need to have your data in tabular format and deployment still requires some effort.

  • GitHub repo yellowbrick

    Visual analysis and diagnostic tools to facilitate machine learning model selection.

    Project mention: Any interesting open projects to join? Or anyone want with some good ideas want to start one? | reddit.com/r/Python | 2021-02-05

    I have contributed to Yellowbrick in the past. https://github.com/DistrictDataLabs/yellowbrick/

  • Scout APM

    Less time debugging, more time building. Scout APM allows you to find and fix performance issues with no hassle. Now with error monitoring and external services monitoring, Scout is a developer's best friend when it comes to application development.

  • GitHub repo orange

    🍊 :bar_chart: :bulb: Orange: Interactive data analysis

    Project mention: ETL Library for Python | reddit.com/r/Python | 2021-09-27

    "On the simpler side". Do you mean with a graphical interface? Then, orange would be a nice solution. https://orangedatamining.com/

  • GitHub repo igel

    a delightful machine learning tool that allows you to train, test, and use models without writing code

    Project mention: Train/fit, test, and use models without writing code | reddit.com/r/ArtificialInteligence | 2021-06-29

    Link to the repo: https://github.com/nidhaloff/igel

  • GitHub repo adversarial-robustness-toolbox

    Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams

    Project mention: adversarial-robustness-toolbox: Adversarial Robustness Toolbox (ART) - Python Library for Machine Learning Security - Evasion, Poisoning, Extraction, Inference - Red and Blue Teams | reddit.com/r/blueteamsec | 2021-12-26
  • GitHub repo hummingbird

    Hummingbird compiles trained ML models into tensor computation for faster inference.

    Project mention: Export and run models with ONNX | dev.to | 2021-09-07

    ONNX opens an avenue for direct inference using a number of languages and platforms. For example, a model could be run directly on Android to limit data sent to a third party service. ONNX is an exciting development with a lot of promise. Microsoft has also released Hummingbird which enables exporting traditional models (sklearn, decision trees, logistical regression..) to ONNX.

  • GitHub repo mars

    Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.

  • GitHub repo m2cgen

    Transform ML models into a native code (Java, C, Python, Go, JavaScript, Visual Basic, C#, R, PowerShell, PHP, Dart, Haskell, Ruby, F#, Rust) with zero dependencies

  • GitHub repo mljar-supervised

    Python package for AutoML on Tabular Data with Feature Engineering, Hyper-Parameters Tuning, Explanations and Automatic Documentation

    Project mention: I'm Looking to Help Contribute, I am very confident with my skills | reddit.com/r/Python | 2021-12-02

    Automated Machine Learning (AutoML) Python package https://github.com/mljar/mljar-supervised You can check list of open issues. Or I can recommend some just tell me your preferences (Im the main contributor)

  • GitHub repo traingenerator

    🧙 A web app to generate template code for machine learning

    Project mention: Traingenerator · Streamlit | reddit.com/r/allokkio | 2021-07-06
  • GitHub repo kmodes

    Python implementations of the k-modes and k-prototypes clustering algorithms, for clustering categorical data

    Project mention: How much of data science is lying? | reddit.com/r/datascience | 2021-01-30

    They were probably looking for K-modes

  • GitHub repo scikit-survival

    Survival analysis built on top of scikit-learn

    Project mention: Survival analysis built on top of scikit-learn | reddit.com/r/learnmachinelearning | 2021-03-24
  • GitHub repo AlphaPy

    Automated Machine Learning [AutoML] with Python, scikit-learn, Keras, XGBoost, LightGBM, and CatBoost

    Project mention: AlphaPy: machine learning framework built on sklearn and pandas. Support pyfolio/xgboost/lightgmb/catboost(gradient boosting on decision tress) etc. Examples include financial market prediction/sports prediction/kaggle. Configurations are set though | reddit.com/r/algoprojects | 2021-12-11
  • GitHub repo iterative-stratification

    scikit-learn cross validators for iterative stratification of multilabel data

    Project mention: TypeError: unhashable type: 'list' when preparing index of labels for MultiLabelBinarizer | reddit.com/r/CodingHelp | 2021-03-31

    I need to create this so I can encode the Labels and run iterative stratification as detailed [here](https://github.com/trent-b/iterative-stratification). Once I have the index prepared, i will run MultiLabelBinarizer to encode the "Labels" list and create a matrix of those values. I will then run the stratification sampling algorithm on that matrix to determine zero-based train and test indices. The code I have below is causing an error.

  • GitHub repo AutoViz

    Automatically Visualize any dataset, any size with a single line of code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.

    Project mention: AutoViz: Automatically visualize any dataset, any size with one line of code | news.ycombinator.com | 2021-02-09
  • GitHub repo onnxmltools

    ONNXMLTools enables conversion of models to ONNX

    Project mention: Export and run other machine learning models | dev.to | 2021-10-14

    With the onnxmltools library, traditional models from scikit-learn, XGBoost and others can be exported to ONNX and loaded with txtai. Additionally, Hugging Face's trainer module can train generic PyTorch modules. This notebook will walk through all these examples.

  • GitHub repo SciKit-Learn Laboratory

    SciKit-Learn Laboratory (SKLL) makes it easy to run machine learning experiments. (by EducationalTestingService)

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-01-03.

Python scikit-learn related posts

Index

What are some of the best open-source scikit-learn projects in Python? This list will help you:

Project Stars
1 data-science-ipython-notebooks 22,327
2 Dask 9,424
3 best-of-ml-python 6,188
4 auto-sklearn 5,978
5 sktime 4,833
6 autogluon 4,071
7 yellowbrick 3,477
8 orange 3,237
9 igel 2,963
10 adversarial-robustness-toolbox 2,741
11 hummingbird 2,723
12 mars 2,336
13 m2cgen 1,992
14 mljar-supervised 1,747
15 modAL 1,570
16 traingenerator 1,126
17 kmodes 956
18 scikit-survival 710
19 AlphaPy 697
20 iterative-stratification 619
21 AutoViz 612
22 onnxmltools 596
23 SciKit-Learn Laboratory 527
Find remote jobs at our new job board 99remotejobs.com. There are 29 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
OPS - Build and Run Open Source Unikernels
Quickly and easily build and deploy open source unikernels in tens of seconds. Deploy in any language to any cloud.
github.com/nanovms