Python Machine learning

Open-source Python projects categorized as Machine learning

Top 23 Python Machine learning Projects

  • GitHub repo Keras

    Deep Learning for humans

    Project mention: [Project] I'm trying to implement StyleGAN2 in Keras to better understand its structure and just AAAAAAAA | reddit.com/r/learnmachinelearning | 2021-05-17
  • GitHub repo scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Is there a way to map cluster centers back to a dataframe? | reddit.com/r/learnpython | 2021-05-19

    To avoid the issue with convergence (and the discrepancy between the labels_ and cluster_centers_), you can set tol=0, though this can of course lead to issues if convergence is a problem. There was an issue about it here. Assuming it's converged, then the order is fine.

  • GitHub repo Face Recognition

    The world's simplest facial recognition api for Python and the command line

    Project mention: Need ideas regarding Face Recognition | reddit.com/r/computervision | 2021-06-02
  • GitHub repo faceswap

    Deepfakes Software For All

    Project mention: Emma Watson | reddit.com/r/KGBTR | 2021-05-17

    https://github.com/deepfakes/faceswap bu uygulamadan yaparsın

  • GitHub repo gym

    A toolkit for developing and comparing reinforcement learning algorithms.

    Project mention: In OpenAi gym, what does '.n' in 'env.observation_space.n' methods mean? | reddit.com/r/learnpython | 2021-06-14

    n is the number of observations possible in the observation space. See for example the HotterColder example, which has an n of 4 with the four different possiblities outlined in the docstring.

  • GitHub repo data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

    Project mention: Beginner in Python for Data Science | reddit.com/r/learnpython | 2020-12-27

    data science ipython notebooks

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: Resume Advice Thread - June 08, 2021 | reddit.com/r/cscareerquestions | 2021-06-08

    "metadata" is "meta-data", "Spacy" is formally "spaCy", "Node" is formally "Node.js", "Mongo" is formally "MongoDB", "Websockets" is (possibly) "WebSocket", "twitter" is formally "Twitter", and "Javascript" is formally "JavaScript".

  • GitHub repo NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

    Project mention: What are state-of-the-art methods for abstractive text summarization ? | reddit.com/r/LanguageTechnology | 2021-06-03
  • GitHub repo Ray

    An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

    Project mention: Ray 1.4.0 | news.ycombinator.com | 2021-06-08
  • GitHub repo PaddlePaddle

    PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

    Project mention: I have issue with only __habs for half datatype? Please help! | reddit.com/r/CUDA | 2021-06-15
  • GitHub repo streamlit

    Streamlit — The fastest way to build data apps in Python

    Project mention: Jupyter notebooks for dashboarding? | reddit.com/r/BusinessIntelligence | 2021-06-13
  • GitHub repo pytorch-lightning

    The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

    Project mention: [P] An introduction to PyKale https://github.com/pykale/pykale​, a PyTorch library that provides a unified pipeline-based API for knowledge-aware multimodal learning and transfer learning on graphs, images, texts, and videos to accelerate interdisciplinary research. Welcome feedback/contribution! | reddit.com/r/MachineLearning | 2021-04-25

    If you want a good example for reference, take a look at Pytorch Lightning's readme (https://github.com/PyTorchLightning/pytorch-lightning) It answers the 3 questions of "what is this", "why should I care", and "how do i use it" almost instantly

  • GitHub repo Prophet

    Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

    Project mention: [D] Unfair Comparison: Neural Networks vs Taylor Polynomials and Fourier Series | reddit.com/r/MachineLearning | 2021-06-14

    Trigonometric polynomials are still used all the time in domains where periodic behavior is expected. As just one example, the very popular prophet forecasting library uses sinusoidal expansion of a time series as one of its core features.

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: The Levenshtein Distance in Production | news.ycombinator.com | 2021-06-06

    > Problem statement: the Levenshtein distance is a string metric for measuring the difference between two sequences

    Another variant is "I have a bunch of words (a dictionary) and one query word, and want to find all words from the dictionary that are close to the query word".

    This leads to an interesting class of problems, because you can do clever things where you precompute search structures (Levenshtein automata [0]) from the dictionary. The similarity queries then run (much) faster – in production, performance matters.

    We recently merged a PR like that into Gensim [1].

    This gave a ~1,500x speed-up compared to naively comparing all pairwise strings with Levenshtein distance. A difference between the training step running for years (=unusable) and minutes.

    [0] http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levensht...

    [1] https://github.com/RaRe-Technologies/gensim/pull/3146

  • GitHub repo EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    Project mention: Best OCR for keras CNN trained handwritten character model? | reddit.com/r/tensorflow | 2021-06-03

    There are many possible approaches to OCR. You might want to take a look at the pipeline of EasyOCR at https://github.com/JaidedAI/EasyOCR.

  • GitHub repo rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants (by RasaHQ)

    Project mention: Building a Social Engineering Chatbot for Cyber Security Awareness | reddit.com/r/artificial | 2021-04-05

    There is a python framework named Rasa, it s really easy and is open source. I use it at work. As for the frontend, you can use botfront ui. https://github.com/RasaHQ/rasa https://github.com/botfront/rasa-webchat

  • GitHub repo imgaug

    Image augmentation for machine learning experiments.

    Project mention: [UPDATE!] Recognize trinkets with Isaac Item Recognizer! And also a few useful features in my newest update. | reddit.com/r/bindingofisaac | 2021-06-13

    I have to improve my dataset with more backgrounds featuring obstacles. At the moment I'm working on creating a dataset with both items and trinkets, and I'm planning on using https://github.com/aleju/imgaug which will replace most of the stuff I'm doing with PIL.

  • GitHub repo horovod

    Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

    Project mention: SKLean, TensorFlow, etc vs Spark ML? | reddit.com/r/apachespark | 2021-02-12

    I'm the maintainer for an open source project called Horovod that allows you to distribute deep learning training (e.g., TensorFlow) on platforms like Spark.

  • GitHub repo the-gan-zoo

    A list of all named GANs!

    Project mention: gan-generated paredolia + abstract watercolor + color fuckery = | reddit.com/r/MediaSynthesis | 2021-05-15

    Here's a link to similar programs: https://github.com/hindupuravinash/the-gan-zoo

  • GitHub repo flair

    A very simple framework for state-of-the-art Natural Language Processing (NLP)

    Project mention: Advice for how to approach classifying apartment posts on facebook? | reddit.com/r/LanguageTechnology | 2021-06-04

    For example, my first approach to the pet sentences would be to label all sentences within a respective text corpus containing according information for either yes or no. You would then convert this to a tertiary tag set, something like ["pet allowed", "pet not allowed", "irrelevant"]. You could then try out a model based on SentenceBert, other sentence-level embeddings/language models or 1D CNNs for this. flairNLP (https://github.com/flairNLP/flair) is a small, little framework which provides comfortable high-level access to different common language models which integrates perfectly with pyTorch.

  • GitHub repo recommenders

    Best Practices on Recommendation Systems

    Project mention: Opinion on choice of model - Recommender System | reddit.com/r/datascience | 2021-04-10

    Then I tried to find some more advanced models and I found this really good list and in there I found the Microsoft one. So it's' where we are now, which a bunch of different models and not a documentation/tutorials out there.

  • GitHub repo d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 175 universities.

    Project mention: I created a way to learn machine learning through Jupyter | reddit.com/r/learnmachinelearning | 2021-04-30

    There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book](https://github.com/d2l-ai/d2l-en) for example). However yours is more detail and could really helps beginners.

  • GitHub repo NLTK

    NLTK Source

    Project mention: Do programmers save chunks of code for repeated use? | reddit.com/r/learnpython | 2021-04-27

    Around 782 - https://github.com/nltk/nltk/blob/develop/nltk/corpus/reader/framenet.py

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-06-15.

Index

What are some of the best open-source Machine learning projects in Python? This list will help you:

Project Stars
1 Keras 51,308
2 scikit-learn 46,078
3 Face Recognition 40,311
4 faceswap 37,182
5 gym 24,419
6 data-science-ipython-notebooks 21,197
7 spaCy 20,639
8 NLP-progress 18,612
9 Ray 16,206
10 PaddlePaddle 15,761
11 streamlit 14,832
12 pytorch-lightning 13,792
13 Prophet 12,871
14 gensim 12,156
15 EasyOCR 11,678
16 rasa 11,527
17 imgaug 11,302
18 horovod 11,297
19 the-gan-zoo 10,621
20 flair 10,448
21 recommenders 10,304
22 d2l-en 10,071
23 NLTK 9,931