Python Machine Learning

Open-source Python projects categorized as Machine Learning | Edit details

Top 23 Python Machine Learning Projects

  • GitHub repo Keras

    Deep Learning for humans

    Project mention: Data Science with JavaScript: What we've learned so far? | news.ycombinator.com | 2021-09-09
  • GitHub repo scikit-learn

    scikit-learn: machine learning in Python

    Project mention: Scikit-Learn Version 1.0 | news.ycombinator.com | 2021-09-14

    Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2

    https://github.com/scikit-learn/scikit-learn/releases/tag/1....

  • Nanos

    Run Linux Software Faster and Safer than Linux with Unikernels.

  • GitHub repo Face Recognition

    The world's simplest facial recognition api for Python and the command line

    Project mention: Photo auto-tagging and facial rec scripts/apps | reddit.com/r/selfhosted | 2021-09-09

    There's quite a few decent python scripts/libraries that would do the job quite simply, such as https://github.com/ageitgey/face_recognition so you could script something up reasonably easily.

  • GitHub repo faceswap

    Deepfakes Software For All

    Project mention: Is it just me, or is faceswap installation trolling me? | reddit.com/r/faceswap | 2021-08-18

    It keeps getting stuck at either"fatal: unable to access 'https://github.com/deepfakes/faceswap.git/':" or "Please run this script with Python version 3.7 or 3.8 64bit and try again."

  • GitHub repo gym

    A toolkit for developing and comparing reinforcement learning algorithms.

    Project mention: The third party environment list is now fixed up and maintained- please submit PRs for any missing environments you're aware of | reddit.com/r/reinforcementlearning | 2021-09-23
  • GitHub repo data-science-ipython-notebooks

    Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.

    Project mention: Beginner in Python for Data Science | reddit.com/r/learnpython | 2020-12-27

    data science ipython notebooks

  • GitHub repo spaCy

    💫 Industrial-strength Natural Language Processing (NLP) in Python

    Project mention: how to find the word "know" but ignore it if it's "don't know" in a text? | reddit.com/r/learnpython | 2021-08-30

    The answers given using regular expression certainly meet the criteria you've specified. However, if you're really wanting to do language processing, you may want to look into a tool like spacy.

  • Scout APM

    Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.

  • GitHub repo NLP-progress

    Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.

    Project mention: [P] NLP "tl;dr" Notes on Transformers | reddit.com/r/MachineLearning | 2021-08-12

    It would also be cool to have some charts with parameter density and even overall effectiveness (a tl;dr version of SOTA-trackers, maybe?) if that doesn't prove too infeasible.

  • GitHub repo Ray

    An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

    Project mention: Writing your First Distributed Python Application with Ray (without multiprocessing) | reddit.com/r/Python | 2021-08-23

    Here is an older discussion on dask vs ray from the creators of both projects: https://github.com/ray-project/ray/issues/642

  • GitHub repo PaddlePaddle

    PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

    Project mention: I have issue with only __habs for half datatype? Please help! | reddit.com/r/CUDA | 2021-06-15
  • GitHub repo streamlit

    Streamlit — The fastest way to build data apps in Python

    Project mention: Get Paid To Train Magic | dev.to | 2021-09-24

    Streamlit.io a really cool python library I found that helps create data tools really quickly to put on the web. It was the quickest way to make an MVP for me so I chose this and focused primarily on the backend.

  • GitHub repo pytorch-lightning

    The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.

    Project mention: [P] An introduction to PyKale https://github.com/pykale/pykale​, a PyTorch library that provides a unified pipeline-based API for knowledge-aware multimodal learning and transfer learning on graphs, images, texts, and videos to accelerate interdisciplinary research. Welcome feedback/contribution! | reddit.com/r/MachineLearning | 2021-04-25

    If you want a good example for reference, take a look at Pytorch Lightning's readme (https://github.com/PyTorchLightning/pytorch-lightning) It answers the 3 questions of "what is this", "why should I care", and "how do i use it" almost instantly

  • GitHub repo Prophet

    Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.

    Project mention: Deeney: "I get 30-40 racist messages a week, easily. On pictures of me, my missus, my kids. And I'm a marmite individual, some like me some don't, you can talk about my footballing ability as much as you want - I just don't understand why you have to talk about the colour of my skin." | reddit.com/r/soccer | 2021-09-21

    These companies will tell advertisers that their data science teams and their algorithms are the best in the world. They hire some of the best and brightest minds, and their clustering algorithms and forecast libraries are some top-tier solutions.

  • GitHub repo rasa

    💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

    Project mention: Building a chatbot - How should I approach this? | reddit.com/r/learnpython | 2021-08-12

    Like u/Hungry_Check_9153 says, about your image of chabots working, I recommend looking at rasa which is an open source python chatbot. To give yourself an idea of the sheer scope of such a project, take a look at their github. Building a chatbot using Rasa, may be a good first step and offers plenty of experience writing and learning python code.

  • GitHub repo EasyOCR

    Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

    Project mention: How would one go about recognizing a text (date) in an image? | reddit.com/r/MLQuestions | 2021-09-07

    Try running https://github.com/JaidedAI/EasyOCR, which reads all the text from a given image. Then loop over the words read and see if it reads your date properly or if it splits it into multiple words. Even if it does this, you can easily find the year (you might need to do some postprocessing on what the ocr reads, i.e transform l to 1). The nice thing is that you get the bounding boxes of where the tect was read so you can use some sort of postprocessing on bbox locations and words read to find the month amd day (i.e. I read 25 at the same y index as the yesr but the x index is to the left, so its probably the day). Good luck :)

  • GitHub repo gensim

    Topic Modelling for Humans

    Project mention: The Levenshtein Distance in Production | news.ycombinator.com | 2021-06-06

    > Problem statement: the Levenshtein distance is a string metric for measuring the difference between two sequences

    Another variant is "I have a bunch of words (a dictionary) and one query word, and want to find all words from the dictionary that are close to the query word".

    This leads to an interesting class of problems, because you can do clever things where you precompute search structures (Levenshtein automata [0]) from the dictionary. The similarity queries then run (much) faster – in production, performance matters.

    We recently merged a PR like that into Gensim [1].

    This gave a ~1,500x speed-up compared to naively comparing all pairwise strings with Levenshtein distance. A difference between the training step running for years (=unusable) and minutes.

    [0] http://blog.notdot.net/2010/07/Damn-Cool-Algorithms-Levensht...

    [1] https://github.com/RaRe-Technologies/gensim/pull/3146

  • GitHub repo imgaug

    Image augmentation for machine learning experiments.

    Project mention: [N] Facebook AI Open Sources AugLy: A New Python Library For Data Augmentation To Develop Robust Machine Learning Models | reddit.com/r/MachineLearning | 2021-06-19

    https://github.com/aleju/imgaug This one is way better for image.

  • GitHub repo horovod

    Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

    Project mention: [D] GPU buying recommendation | reddit.com/r/MachineLearning | 2021-07-17

    If you just want to run tensorflow or pytorch for a Jupyter notebook, setting the environment shouldn't be difficult. I know that AWS has a marketplace of preconfigured images. However, you can go as advanced as setting up a cluster of gpu-equipped nodes to setup Horovod (https://github.com/horovod/horovod) to do distributed machine learning. Yes, there's a learning curve, but you cannot acquire this skillet any other way.

  • GitHub repo tensor2tensor

    Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.

    Project mention: [D] Resources for Understanding The Original Transformer Paper | reddit.com/r/MachineLearning | 2021-09-08

    Code for https://arxiv.org/abs/1706.03762 found: https://github.com/tensorflow/tensor2tensor

  • GitHub repo recommenders

    Best Practices on Recommendation Systems

    Project mention: Opinion on choice of model - Recommender System | reddit.com/r/datascience | 2021-04-10

    Then I tried to find some more advanced models and I found this really good list and in there I found the Microsoft one. So it's' where we are now, which a bunch of different models and not a documentation/tutorials out there.

  • GitHub repo d2l-en

    Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.

    Project mention: I created a way to learn machine learning through Jupyter | reddit.com/r/learnmachinelearning | 2021-04-30

    There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book](https://github.com/d2l-ai/d2l-en) for example). However yours is more detail and could really helps beginners.

  • GitHub repo the-gan-zoo

    A list of all named GANs!

    Project mention: gan-generated paredolia + abstract watercolor + color fuckery = | reddit.com/r/MediaSynthesis | 2021-05-15

    Here's a link to similar programs: https://github.com/hindupuravinash/the-gan-zoo

  • GitHub repo jina

    Cloud-native neural search framework for 𝙖𝙣𝙮 kind of data

    Project mention: Show HN: AI powered meme search, open-source | news.ycombinator.com | 2021-09-03

    We're using Transformers with `sentence-transformers/paraphrase-distilroberta-base-v1` model.

    The framework is Jina (https://github.com/jina-ai/jina/) so it's pretty high-level. You can see the indexing/search Flow on lines 37-52 of https://github.com/alexcg1/jina-meme-search-example/blob/mai...

NOTE: The open source projects on this list are ordered by number of github stars. The number of mentions indicates repo mentiontions in the last 12 Months or since we started tracking (Dec 2020). The latest post mention was on 2021-09-24.

Index

What are some of the best open-source Machine Learning projects in Python? This list will help you:

Project Stars
1 Keras 52,631
2 scikit-learn 47,301
3 Face Recognition 41,526
4 faceswap 38,398
5 gym 25,281
6 data-science-ipython-notebooks 21,599
7 spaCy 21,363
8 NLP-progress 19,108
9 Ray 17,498
10 PaddlePaddle 16,509
11 streamlit 16,005
12 pytorch-lightning 15,420
13 Prophet 13,373
14 rasa 12,682
15 EasyOCR 12,639
16 gensim 12,481
17 imgaug 11,766
18 horovod 11,660
19 tensor2tensor 11,573
20 recommenders 11,328
21 d2l-en 11,051
22 the-gan-zoo 11,033
23 jina 11,020
Find remote jobs at our new job board 99remotejobs.com. There are 25 new remote jobs listed recently.
Are you hiring? Post a new remote job listing for free.
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com