Top 23 Python Machine Learning Projects
scikit-learn: machine learning in PythonProject mention: Scikit-Learn Version 1.0 | news.ycombinator.com | 2021-09-14
Just to clarify, scikit-learn 1.0 has not been released yet. The latest tag in the github repo is 1.0.rc2
Run Linux Software Faster and Safer than Linux with Unikernels.
The world's simplest facial recognition api for Python and the command lineProject mention: Photo auto-tagging and facial rec scripts/apps | reddit.com/r/selfhosted | 2021-09-09
There's quite a few decent python scripts/libraries that would do the job quite simply, such as https://github.com/ageitgey/face_recognition so you could script something up reasonably easily.
Deepfakes Software For AllProject mention: Is it just me, or is faceswap installation trolling me? | reddit.com/r/faceswap | 2021-08-18
It keeps getting stuck at either"fatal: unable to access 'https://github.com/deepfakes/faceswap.git/':" or "Please run this script with Python version 3.7 or 3.8 64bit and try again."
A toolkit for developing and comparing reinforcement learning algorithms.Project mention: The third party environment list is now fixed up and maintained- please submit PRs for any missing environments you're aware of | reddit.com/r/reinforcementlearning | 2021-09-23
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.Project mention: Beginner in Python for Data Science | reddit.com/r/learnpython | 2020-12-27
data science ipython notebooks
💫 Industrial-strength Natural Language Processing (NLP) in PythonProject mention: how to find the word "know" but ignore it if it's "don't know" in a text? | reddit.com/r/learnpython | 2021-08-30
The answers given using regular expression certainly meet the criteria you've specified. However, if you're really wanting to do language processing, you may want to look into a tool like spacy.
Scout APM: A developer's best friend. Try free for 14-days. Scout APM uses tracing logic that ties bottlenecks to source code so you know the exact line of code causing performance issues and can get back to building a great product faster.
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.Project mention: [P] NLP "tl;dr" Notes on Transformers | reddit.com/r/MachineLearning | 2021-08-12
It would also be cool to have some charts with parameter density and even overall effectiveness (a tl;dr version of SOTA-trackers, maybe?) if that doesn't prove too infeasible.
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.Project mention: Writing your First Distributed Python Application with Ray (without multiprocessing) | reddit.com/r/Python | 2021-08-23
Here is an older discussion on dask vs ray from the creators of both projects: https://github.com/ray-project/ray/issues/642
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice （『飞桨』核心框架，深度学习&机器学习高性能单机、分布式训练和跨平台部署）Project mention: I have issue with only __habs for half datatype? Please help! | reddit.com/r/CUDA | 2021-06-15
Streamlit — The fastest way to build data apps in PythonProject mention: Get Paid To Train Magic | dev.to | 2021-09-24
Streamlit.io a really cool python library I found that helps create data tools really quickly to put on the web. It was the quickest way to make an MVP for me so I chose this and focused primarily on the backend.
The lightweight PyTorch wrapper for high-performance AI research. Scale your models, not the boilerplate.Project mention: [P] An introduction to PyKale https://github.com/pykale/pykale, a PyTorch library that provides a unified pipeline-based API for knowledge-aware multimodal learning and transfer learning on graphs, images, texts, and videos to accelerate interdisciplinary research. Welcome feedback/contribution! | reddit.com/r/MachineLearning | 2021-04-25
If you want a good example for reference, take a look at Pytorch Lightning's readme (https://github.com/PyTorchLightning/pytorch-lightning) It answers the 3 questions of "what is this", "why should I care", and "how do i use it" almost instantly
Tool for producing high quality forecasts for time series data that has multiple seasonality with linear or non-linear growth.Project mention: Deeney: "I get 30-40 racist messages a week, easily. On pictures of me, my missus, my kids. And I'm a marmite individual, some like me some don't, you can talk about my footballing ability as much as you want - I just don't understand why you have to talk about the colour of my skin." | reddit.com/r/soccer | 2021-09-21
These companies will tell advertisers that their data science teams and their algorithms are the best in the world. They hire some of the best and brightest minds, and their clustering algorithms and forecast libraries are some top-tier solutions.
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistantsProject mention: Building a chatbot - How should I approach this? | reddit.com/r/learnpython | 2021-08-12
Like u/Hungry_Check_9153 says, about your image of chabots working, I recommend looking at rasa which is an open source python chatbot. To give yourself an idea of the sheer scope of such a project, take a look at their github. Building a chatbot using Rasa, may be a good first step and offers plenty of experience writing and learning python code.
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.Project mention: How would one go about recognizing a text (date) in an image? | reddit.com/r/MLQuestions | 2021-09-07
Try running https://github.com/JaidedAI/EasyOCR, which reads all the text from a given image. Then loop over the words read and see if it reads your date properly or if it splits it into multiple words. Even if it does this, you can easily find the year (you might need to do some postprocessing on what the ocr reads, i.e transform l to 1). The nice thing is that you get the bounding boxes of where the tect was read so you can use some sort of postprocessing on bbox locations and words read to find the month amd day (i.e. I read 25 at the same y index as the yesr but the x index is to the left, so its probably the day). Good luck :)
Topic Modelling for HumansProject mention: The Levenshtein Distance in Production | news.ycombinator.com | 2021-06-06
> Problem statement: the Levenshtein distance is a string metric for measuring the difference between two sequences
Another variant is "I have a bunch of words (a dictionary) and one query word, and want to find all words from the dictionary that are close to the query word".
This leads to an interesting class of problems, because you can do clever things where you precompute search structures (Levenshtein automata ) from the dictionary. The similarity queries then run (much) faster – in production, performance matters.
We recently merged a PR like that into Gensim .
This gave a ~1,500x speed-up compared to naively comparing all pairwise strings with Levenshtein distance. A difference between the training step running for years (=unusable) and minutes.
Image augmentation for machine learning experiments.Project mention: [N] Facebook AI Open Sources AugLy: A New Python Library For Data Augmentation To Develop Robust Machine Learning Models | reddit.com/r/MachineLearning | 2021-06-19
https://github.com/aleju/imgaug This one is way better for image.
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.Project mention: [D] GPU buying recommendation | reddit.com/r/MachineLearning | 2021-07-17
If you just want to run tensorflow or pytorch for a Jupyter notebook, setting the environment shouldn't be difficult. I know that AWS has a marketplace of preconfigured images. However, you can go as advanced as setting up a cluster of gpu-equipped nodes to setup Horovod (https://github.com/horovod/horovod) to do distributed machine learning. Yes, there's a learning curve, but you cannot acquire this skillet any other way.
Library of deep learning models and datasets designed to make deep learning more accessible and accelerate ML research.Project mention: [D] Resources for Understanding The Original Transformer Paper | reddit.com/r/MachineLearning | 2021-09-08
Code for https://arxiv.org/abs/1706.03762 found: https://github.com/tensorflow/tensor2tensor
Best Practices on Recommendation SystemsProject mention: Opinion on choice of model - Recommender System | reddit.com/r/datascience | 2021-04-10
Then I tried to find some more advanced models and I found this really good list and in there I found the Microsoft one. So it's' where we are now, which a bunch of different models and not a documentation/tutorials out there.
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 200 universities.Project mention: I created a way to learn machine learning through Jupyter | reddit.com/r/learnmachinelearning | 2021-04-30
There are actually some online books and courses built on Jupyter Notebook ([Dive to Deep Learning Book](https://github.com/d2l-ai/d2l-en) for example). However yours is more detail and could really helps beginners.
A list of all named GANs!Project mention: gan-generated paredolia + abstract watercolor + color fuckery = | reddit.com/r/MediaSynthesis | 2021-05-15
Here's a link to similar programs: https://github.com/hindupuravinash/the-gan-zoo
Cloud-native neural search framework for 𝙖𝙣𝙮 kind of dataProject mention: Show HN: AI powered meme search, open-source | news.ycombinator.com | 2021-09-03
We're using Transformers with `sentence-transformers/paraphrase-distilroberta-base-v1` model.
The framework is Jina (https://github.com/jina-ai/jina/) so it's pretty high-level. You can see the indexing/search Flow on lines 37-52 of https://github.com/alexcg1/jina-meme-search-example/blob/mai...
What are some of the best open-source Machine Learning projects in Python? This list will help you:
Are you hiring? Post a new remote job listing for free.