Python Machine learning

Open-source Python projects categorized as Machine learning

Top 23 Python Machine learning Projects

  • GitHub repo Keras

    Deep Learning for humans

    Project mention: [D] Batch normalization before or after activation function | | 2021-02-23
  • GitHub repo scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Using TinyML to identify farts | | 2021-02-22

    The model in question is trained using Scikit-Learn, a Python Machine Learning library. The audio data is loaded into numpy arrays, then split into training and testing data, the model is trained using the training data, then tested with the testing data to give an idea on the accuracy.

  • Scout

    Get performance insights in less than 4 minutes. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo Face Recognition

    The world's simplest facial recognition api for Python and the command line

    Project mention: OpenCV or Tensorflow or both ? | | 2021-02-21

    It’s call face recognition. Face recognition contains two step face detection and face comparison. If you don’t have any background on this I suggest you try the face_recognition python module

  • GitHub repo faceswap

    Deepfakes Software For All

    Project mention: Is there a free easy-to-use program to make deepfakes? | | 2021-02-17
  • GitHub repo gym

    A toolkit for developing and comparing reinforcement learning algorithms.

    Project mention: Physics model for teaching AI Brazilian Jiu-Jitsu | | 2021-02-11
  • GitHub repo data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

    Project mention: Resources for learning Python from scratch specifically for data ingestion | | 2021-02-13

    data science ipython notebooks

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Ask HN: What is your production ML stack like? (2021) | | 2021-02-08

    Here's the ML stack I have been using for my last project:

    - Doing NLP with spaCy ( as I consider it to be the most production ready framework for NLP

    - Annotating datasets with Prodigy (, a paid tool made by the spaCy team

    - Deploying the trained spaCy models onto NLP Cloud (

    - Use the models through the NLP Cloud API in production and enrich my Django application out of it

  • GitHub repo NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

    Project mention: What are some classification tasks where BERT-based models don't work well? In a similar vein, what are some generative tasks where fine-tuning GPT-2/LM does not work well? | | 2021-02-21

    One place to start is nlp progress if leader boards are your thing, if the model on top of the leader board is not a transformer based model and one further down is, you have your answer.

  • GitHub repo Ray

    An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

    Project mention: How to get my multi-agents more collaborative? | | 2021-02-15

    QMIX is indeed a great paper. I'm planning on using it with RLLIB on my env, however it asks some work to adapt and understand the subtleties ;) ( such as the agents groups : )

  • GitHub repo PaddlePaddle

    PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

    Project mention: Alternatives to google collab? | | 2021-01-31
  • GitHub repo streamlit

    Streamlit — The fastest way to build data apps in Python

    Project mention: Which GUI framework do you/would you use for which purposes and why? | | 2021-02-13

    streamlit (Oriented Data science)

  • GitHub repo Prophet

    Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

    Project mention: [D] Is there any alternative to DL to multivariate time series forecasting? | | 2021-02-23

    IIRC Facebook's Prophet supports multivariate forecasting as well.

  • GitHub repo pytorch-lightning

    The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

    Project mention: DDP with model parallelism with multi host multi GPU system | | 2021-02-07
  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: Koan: A word2vec negative sampling implementation with correct CBOW update | | 2021-01-02

    Apparently it did:

  • GitHub repo horovod

    Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

    Project mention: SKLean, TensorFlow, etc vs Spark ML? | | 2021-02-12

    I'm the maintainer for an open source project called Horovod that allows you to distribute deep learning training (e.g., TensorFlow) on platforms like Spark.

  • GitHub repo imgaug

    Image augmentation for machine learning experiments.

    Project mention: Bounding boxes do not completely wrap the objects with YOLOv4 | | 2021-02-06

    I would also recommend you to give a try to TensorFlow Object Detection Model - with augmentation - pipeline. The same worked for me in a similar use case where I had to localise logo on documents.

  • GitHub repo EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    Project mention: Using Google's OCR API with Puppeteer for Visual Testing | | 2021-02-08

    There are multiple open-source OCR tools like pytesseract or EasyOCR, which can be used to integrate OCR functionality into a program. However, these tools require significant configurations to get up and running to provide results with an acceptable accuracy level.

  • GitHub repo NLTK

    NLTK Source

    Project mention: Wordnet and Sexism | | 2021-01-03
  • GitHub repo TFLearn

    Deep learning library featuring a higher-level API for TensorFlow.

  • GitHub repo nni

    An open source AutoML toolkit for automate machine learning lifecycle, including feature engineering, neural architecture search, model compression and hyper-parameter tuning.

    Project mention: How we were able to achieve hyper-parameter tuning (HPT) for deep learning workflows at 1.5x faster in our clusters and 3x cheaper on AWS | | 2021-02-23

    To tackle the problem of long and expensive HPT workflows, our team at Petuum collaborated with Microsoft to integrate AdaptDL with Neural Network Intelligence (NNI). AdaptDL is an open-source tool in the CASL (Composable, Automatic, and Scalable Learning) ecosystem. AdaptDL offers adaptive resource management for distributed clusters, and reduces the cost of deep learning workloads ranging from a few training/tuning trials to thousands. NNI from the Microsoft open-source community, is a toolkit for automatic machine learning (AutoML) and hyper-parameter tuning.

  • GitHub repo awesome-aws

    A curated list of awesome Amazon Web Services (AWS) libraries, open source repos, guides, blogs, and other resources. Featuring the Fiery Meter of AWSome.

  • GitHub repo bert-as-service

    Mapping a variable-length sentence to a fixed-length vector using BERT model

    Project mention: Needed 100% to pass a safety quiz, need to wait a week to retake | | 2021-01-12

    You joke but

  • GitHub repo fashion-mnist

    A MNIST-like fashion product database. Benchmark :point_right:

    Project mention: [P] Why are stacked autoencoders still a thing? | | 2021-01-25


NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-02-23.


What are some of the best open-source Machine learning projects in Python? This list will help you:

Project Stars
1 Keras 50,757
2 scikit-learn 44,626
3 Face Recognition 38,655
4 faceswap 34,226
5 gym 23,505
6 data-science-ipython-notebooks 20,249
7 spaCy 19,619
8 NLP-progress 17,810
9 Ray 14,865
10 PaddlePaddle 14,351
11 streamlit 13,389
12 Prophet 12,283
13 pytorch-lightning 12,092
14 gensim 11,750
15 horovod 10,835
16 imgaug 10,713
17 EasyOCR 10,671
18 NLTK 9,645
19 TFLearn 9,522
20 nni 9,102
21 awesome-aws 8,923
22 bert-as-service 8,904
23 fashion-mnist 8,834