SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 Python Machine Learning Projects
-
Thanks! :) I'm pushing them into transformers, pytorch-gemma and collabing with the Gemma team to resolve all the issues :)
The RoPE fix should already be in transformers 4.38.2: https://github.com/huggingface/transformers/pull/29285
My main PR for transformers which fixes most of the issues (some still left): https://github.com/huggingface/transformers/pull/29402
-
Project mention: Best AI Tools for Students Learning Development and Engineering | dev.to | 2024-03-18
Which label applies to a tool sometimes depends on what you do with it. For example, PyTorch or TensorFlow can be called a library, a toolkit, or a machine-learning framework.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
All breaking changes are listed here: https://github.com/keras-team/keras/issues/18467
You can use this migration guide to identify and fix each of these issues (and further, making your code run on JAX or PyTorch): https://keras.io/guides/migrating_to_keras_3/
-
sklearn is adding support through the dataframe interchange protocol (https://github.com/scikit-learn/scikit-learn/issues/25896). scipy, as far as I know, doesn't explicitly support dataframes (it just happens to work when you wrap a Series in `np.array` or `np.asarray`). I don't know about PyTorch but in general you can convert to numpy.
-
Camera connected to a PI? Something like this could run locally: https://github.com/ageitgey/face_recognition
-
Project mention: faceswap VS facefusion - a user suggested alternative | libhunt.com/r/faceswap | 2024-01-30
-
Project mention: How would i go about having YOLO v5 return me a list from left to right of all detected objects in an image? | /r/computervision | 2023-11-13
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
For open assistant, the code: https://github.com/LAION-AI/Open-Assistant/tree/main/inference
-
Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12
Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.
-
A co-founder announced they disbanded their robots team a couple years ago: https://venturebeat.com/business/openai-disbands-its-robotic...
That was the same time they depreciated OpenAI Gym: https://github.com/openai/gym
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.
-
While building dashboards in Streamlit, I found myself really missing Buefy's (Bulma) modern web components.
Specially due to the inability to add new values to Streamlit's multiselect [1], some missing controls like a polished image carousel [2] or a highly customizable data table.
Long story short, we put together streamfy (Streamlit + Buefy) as an MIT licensed project in GitHub to bring Buefy to Streamlit.
Demo: https://streamfy.streamlit.app
All the form components are implemented, missing half of other non-form UX components. There is plenty of room for PRs, testing, feedback, documentation, example, etc.
Please send issues and contributions to GitHub project [3] and general feedback to X / Twitter [4]
Thanks!
-
Ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
22. Ray | Github | tutorial
-
SpaCy: An open-source library providing tools for advanced NLP tasks like tokenization, entity recognition, and part-of-speech tagging.
-
Project mention: Show HN: Dropbase – Build internal web apps with just Python | news.ycombinator.com | 2023-12-05
There's also that library all the AI models started using that gives you a public URL to share. After researching it: https://www.gradio.app/ is the link.
It's used specifically for making simple UIs for machine learning apps. But I guess technically you could use it for anything.
-
pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
Project mention: Lightning AI Studios – A persistent GPU cloud environment | news.ycombinator.com | 2023-12-14 -
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
Project mention: Open-Sourcing High-Frequency Trading and Market-Making Backtesting Tool | /r/Python | 2023-12-06
You might want to suggest this as an extension to the OpenBB project - I imagine that could be of interest to them if there isn’t something like it built in already :-)
-
ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
-
NLP-progress
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
Project mention: Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide | dev.to | 2023-12-27PyTesseract Module [ Github ] EasyOCR Module [ Github ] PaddlePaddle OCR [ Github ]
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
Project mention: which book to chose for deep learning :lan Goodfellow or francois chollet | /r/learnmachinelearning | 2023-04-07 -
Project mention: The CEO of Ultralytics (yolov8) using LLMs to engage with commenters on GitHub | news.ycombinator.com | 2024-02-12
Yep, I noticed this a while ago. It posts easily identifiable ChatGPT responses. It also posts garbage wrong answers which makes it worse than useless. Totally disrespectful to the userbase.
https://github.com/ultralytics/ultralytics/issues/5748#issue...
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Python Machine Learning related posts
- Efficiently Managing and Querying Visual Data With MongoDB Atlas Vector Search and FiftyOne
- FiftyOne Computer Vision Tips and Tricks - March 15, 2024
- Half-Quadratic Quantization of Large Machine Learning Models
- Fundamental Components of Deep Learning (category theory) [pdf]
- A History of CLIP Model Training Data Advances
- Ship Faster by Organising Less
- Mandala: A little plaground for testing pixel logic patterns
-
A note from our sponsor - SaaSHub
www.saashub.com | 19 Mar 2024
Index
What are some of the best open-source Machine Learning projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 122,103 |
2 | Pytorch | 76,684 |
3 | Keras | 60,643 |
4 | scikit-learn | 57,674 |
5 | Face Recognition | 51,332 |
6 | faceswap | 48,827 |
7 | yolov5 | 45,808 |
8 | Open-Assistant | 36,472 |
9 | Airflow | 33,864 |
10 | gym | 33,676 |
11 | DeepSpeed | 31,898 |
12 | streamlit | 30,808 |
13 | Ray | 30,364 |
14 | spaCy | 28,455 |
15 | gradio | 27,486 |
16 | pytorch-lightning | 26,457 |
17 | data-science-ipython-notebooks | 26,278 |
18 | OpenBBTerminal | 25,785 |
19 | ML-From-Scratch | 23,004 |
20 | NLP-progress | 22,238 |
21 | EasyOCR | 21,448 |
22 | d2l-en | 21,232 |
23 | ultralytics | 20,652 |