Python Machine Learning

Open-source Python projects categorized as Machine Learning

Top 23 Python Machine Learning Projects

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

    Project mention: GPU Comparisons: RTX 6000 ADA vs A100 80GB vs 2x 4090s | reddit.com/r/deeplearning | 2022-12-02

    Looked into this last night and yeah, NVLink works the way you described because of misleading marketing- no contiguous memory pool, just a faster interconnect so maybe model parallelisation scales a bit better but you still have to implement it. Also saw an example where some PyTorch GPT2 models scaled horrifically in training with multiple PCIe V100s and 3090s that didn’t have NVLink so that’s a caveat with dual 4090s not having NVLink.

  • Keras

    Deep Learning for humans

    Project mention: 65 Blog Posts to Learn Data Science | dev.to | 2022-11-30

    Hello world. This tutorial is a gentle introduction to building modern text recognition system using deep learning in 15 minutes. It will teach you the main ideas of how to use Keras and Supervisely for this problem. This guide is for anyone who is interested in using Deep Learning for text recognition in images but has no idea where to start.

  • InfluxDB

    Build time-series-based applications quickly and at scale.. InfluxDB is the Time Series Data Platform where developers build real-time applications for analytics, IoT and cloud-native services in less time with less code.

  • scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Scaling PostgresML to 1M Requests per Second | news.ycombinator.com | 2022-11-11

    Of course. The paper is at https://arxiv.org/abs/1408.3060.

    > Our method applies to any translation invariant and any dot-product kernel, such as the popular RBF kernels and polynomial kernels. We prove that the approximation is unbiased and has low variance. Experiments show that we achieve similar accuracy to full kernel expansions and Random Kitchen Sinks while being 100x faster and using 1000x less memory. These improvements, especially in terms of memory usage, make kernel methods more practical for applications that have large training sets and/or require real-time prediction.

    Sadly Fastfood didn't quite make it into Scikit[1], but did land in scikit-learn-extra[2].

    1. https://github.com/scikit-learn/scikit-learn/pull/3665. A shame, Scikit's equivalents scale very poorly.

    2. https://scikit-learn-extra.readthedocs.io/en/stable/generate...

  • Face Recognition

    The world's simplest facial recognition api for Python and the command line

    Project mention: Any algorithm to get coordinates of classified face? | reddit.com/r/neuralnetworks | 2022-11-12

    face_recognition - fairly easy install as long as dlib compiles for you. Should be able to get your face location ok. This is the older, most mature, but doesn't have as many features and might not have the accuracy of other models. Able to do face recognition.

  • faceswap

    Deepfakes Software For All

    Project mention: [D] How is it checked if models do not just memorize their training examples? | reddit.com/r/MachineLearning | 2022-04-28

    But there's a nice survey on Arxiv here of various deepfake / face swap methods. Some of methods listed in the table on page 4, such as Faceswap and Faceswap-GAN, apparently use encoder-decoder models. I think Faceswap-GAN was the one that I was thinking of in particular; apparently it adds a perceptual loss and an adversarial loss to an autoencoder.

  • DeepFaceLab

    DeepFaceLab is the leading software for creating deepfakes.

    Project mention: Margot Robbie - Dirndl Pantene Pro-V | reddit.com/r/SFWdeepfakes | 2022-11-15

    I am using https://github.com/iperov/DeepFaceLab and Adobe Video and Picture editing software. You will find all information on github. The hardware is mainly a Zotac RTX 3090 24GB combined with a AMD Ryzen 9 5950X 32GB RAM

  • yolov5

    YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite

    Project mention: YOLOv5 SOTA Realtime Instance Segmentation | news.ycombinator.com | 2022-11-22
  • Scout APM

    Truly a developer’s best friend. Scout APM is great for developers who want to find and fix performance issues in their applications. With Scout, we'll take care of the bugs so you can focus on building great things 🚀.

  • gym

    A toolkit for developing and comparing reinforcement learning algorithms.

    Project mention: Pusher task on mujoco/pybulletenv | reddit.com/r/reinforcementlearning | 2022-11-19
  • spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Classification + Python + Spacy | dev.to | 2022-12-02

    One approach to string classification is to use a library like spacy to perform natural language processing (NLP) on the string, and then use a machine learning algorithm to classify the resulting data. Here is an example of how you might do this in Python:

  • data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

  • Ray

    Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a toolkit of libraries (Ray AIR) for accelerating ML workloads.

    Project mention: Think about it for a second | reddit.com/r/mathmemes | 2022-10-19

    https://ray.io (just dropping the link)

  • ML-From-Scratch

    Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.

    Project mention: Coding K-Means Clustering using Python and NumPy | dev.to | 2022-09-22

    ML From Scratch - An excellent Github repository containing implementations of many machine learning models and algorithms. Easy to understand and highly recommended.

  • streamlit

    Streamlit — The fastest way to build data apps in Python

    Project mention: Advent of Code - Day Downloader - Website | reddit.com/r/adventofcode | 2022-11-27

    I made a Python streamlit web page to select and download the question and/or input of multiple days on Advent of Code.

  • NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

    Project mention: NLP research status | reddit.com/r/datascience | 2022-10-15
  • lightning

    Build and train PyTorch models and connect them to the ML lifecycle using Lightning App templates, without handling DIY infrastructure, cost management, scaling, and other headaches.

    Project mention: We just release a complete open-source solution for accelerating Stable Diffusion pretraining and fine-tuning! | reddit.com/r/StableDiffusion | 2022-11-11

    Our codebase for the diffusion models builds heavily on OpenAI's ADM codebase , lucidrains, Stable Diffusion, Lightning and Hugging Face. Thanks for open-sourcing!

  • OpenBBTerminal

    Investment Research for Everyone, Anywhere.

    Project mention: Yield curve (ycrv) | reddit.com/r/openBB | 2022-11-30

    There currently is a PR open that fixes this. Expect the functionality to work in the next release which should be released soon. If you are using the Python version, you should be able to pull the changes in even sooner.

  • jina

    🔮 The most advanced MLOps platform for multimodal AI on the cloud · Neural Search · Creative AI · Cloud Native

    Project mention: Have you used Jina for multi-modal applications? | dev.to | 2022-10-24

    How will you build a multi-modal application? I just noticed the release ofJina which is a MLOps framework that empowers anyone to build cross-modal and multi-modal applications on the cloud. It uplifts a PoC into a production-ready service. Jina handles the infrastructure complexity, making advanced solution engineering and cloud-native technologies accessible to every developer. If you tried before, please let me know how do you find about it? Thanks!

  • EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    Project mention: [P]Modern open-source OCR capabilities and which model to choose | reddit.com/r/MachineLearning | 2022-11-18

    I've used EasyOCR for number recognition tasks. Works fairly well. https://github.com/JaidedAI/EasyOCR

  • d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 400 universities from 60 countries including Stanford, MIT, Harvard, and Cambridge.

    Project mention: How to pre-train BERT on different objective tasks using HuggingFace | reddit.com/r/deeplearning | 2022-04-10

    There might is bert library for pre-train bert model in huggingface, But I suggestion that you train bert model in native pytorch to understand detail, Limu's course is recommended for you

  • Prophet

    Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

    Project mention: [D] my PhD advisor "machine learning researchers are like children, always re-discovering things that are already known and make a big deal out of it." | reddit.com/r/MachineLearning | 2022-11-17

    Ok I feel like I’m taking crazy pills. Just look at the prophet source code for Fourier transforms around line 495 here and compare https://github.com/facebook/prophet/blob/main/R/R/prophet.R

  • rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: Show HN: Flex – transpile natural language to a programming language | news.ycombinator.com | 2022-11-10

    At the moment it can recognise the type of statements in the training data set [1] and transpile them to Python, Java or C++ using the mappings defined here [2].

    This is very different from how Codex/Autopilot work as it is trained using an NLU framework [3] which is usually used for training chatbots.

    [1]: https://github.com/Flex-lang/transpiler/tree/master/transpil...

    [2]: https://github.com/Flex-lang/transpiler/tree/master/transpil...

    [3]: https://github.com/RasaHQ/rasa

  • datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

    Project mention: FauxPilot – an open-source GitHub Copilot server | news.ycombinator.com | 2022-08-02

    And then pass that my_code.json as the dataset name.

    [1] https://github.com/huggingface/datasets

  • recommenders

    Best Practices on Recommendation Systems

    Project mention: There is framework for everything. | reddit.com/r/ProgrammerHumor | 2022-08-04
  • Zigi

    The context switching struggle is real. Zigi makes context switching a thing of the past. It monitors Jira and GitHub updates, pings you when PRs need approval and lets you take fast actions - all directly from Slack!

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2022-12-02.

Python Machine Learning related posts

Index

What are some of the best open-source Machine Learning projects in Python? This list will help you:

Project Stars
1 transformers 75,115
2 Keras 56,758
3 scikit-learn 52,227
4 Face Recognition 46,626
5 faceswap 42,849
6 DeepFaceLab 35,961
7 yolov5 33,311
8 gym 29,100
9 spaCy 24,644
10 data-science-ipython-notebooks 24,323
11 Ray 22,890
12 ML-From-Scratch 21,698
13 streamlit 21,567
14 NLP-progress 21,114
15 lightning 20,798
16 OpenBBTerminal 18,101
17 jina 16,764
18 EasyOCR 16,435
19 d2l-en 15,680
20 Prophet 15,217
21 rasa 15,117
22 datasets 14,818
23 recommenders 14,582
Write Clean Python Code. Always.
Sonar helps you commit clean code every time. With over 225 unique rules to find Python bugs, code smells & vulnerabilities, Sonar finds the issues while you focus on the work.
www.sonarsource.com