Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →
Top 23 Python Machine Learning Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
-
Open-Assistant
OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
Ray
Ray is a unified framework for scaling AI and Python applications. Ray consists of a core distributed runtime and a set of AI Libraries for accelerating ML workloads.
-
pytorch-lightning
Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
-
data-science-ipython-notebooks
Data science Python notebooks: Deep learning (TensorFlow, Theano, Caffe, Keras), scikit-learn, Kaggle, big data (Spark, Hadoop MapReduce, HDFS), matplotlib, pandas, NumPy, SciPy, Python essentials, AWS, and various command lines.
-
ML-From-Scratch
Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
-
NLP-progress
Repository to track the progress in Natural Language Processing (NLP), including the datasets and the current state-of-the-art for the most common NLP tasks.
-
EasyOCR
Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.
-
d2l-en
Interactive deep learning book with multi-framework code, math, and discussions. Adopted at 500 universities from 70 countries including Stanford, MIT, Harvard, and Cambridge.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
Project mention: Maxtext: A simple, performant and scalable Jax LLM | news.ycombinator.com | 2024-04-23Is t5x an encoder/decoder architecture?
Some more general options.
The Flax ecosystem
https://github.com/google/flax?tab=readme-ov-file
or dm-haiku
https://github.com/google-deepmind/dm-haiku
were some of the best developed communities in the Jax AI field
Perhaps the “trax” repo? https://github.com/google/trax
Some HF examples https://github.com/huggingface/transformers/tree/main/exampl...
Sadly it seems much of the work is proprietary these days, but one example could be Grok-1, if you customize the details. https://github.com/xai-org/grok-1/blob/main/run.py
TensorFlow, developed by Google, and PyTorch, developed by Facebook, are two of the most popular frameworks for building and training complex machine learning models. TensorFlow is known for its flexibility and robust scalability, making it suitable for both research prototypes and production deployments. PyTorch is praised for its ease of use, simplicity, and dynamic computational graph that allows for more intuitive coding of complex AI models. Both frameworks support a wide range of AI models, from simple linear regression to complex deep neural networks.
As a beginner, I was looking for something simple and flexible for developing deep learning models and that is when I found Keras. Many AI/ML professionals appreciate Keras for its simplicity and efficiency in prototyping and developing deep learning models, making it a preferred choice, especially for beginners and for projects requiring rapid development.
Project mention: AutoCodeRover resolves 22% of real-world GitHub in SWE-bench lite | news.ycombinator.com | 2024-04-09Thank you for your interest. There are some interesting examples in the SWE-bench-lite benchmark which are resolved by AutoCodeRover:
- From sympy: https://github.com/sympy/sympy/issues/13643. AutoCodeRover's patch for it: https://github.com/nus-apr/auto-code-rover/blob/main/results...
- Another one from scikit-learn: https://github.com/scikit-learn/scikit-learn/issues/13070. AutoCodeRover's patch (https://github.com/nus-apr/auto-code-rover/blob/main/results...) modified a few lines below (compared to the developer patch) and wrote a different comment.
There are more examples in the results directory (https://github.com/nus-apr/auto-code-rover/tree/main/results).
Camera connected to a PI? Something like this could run locally: https://github.com/ageitgey/face_recognition
Project mention: faceswap VS facefusion - a user suggested alternative | libhunt.com/r/faceswap | 2024-01-30
Ref https://www.youtube.com/watch?v=0GwnxFNfZhM https://github.com/ultralytics/yolov5 https://dev.to/gfstealer666/kaaraich-yolo-alkrithuemainkaartrwcchcchabwatthu-object-detection-3lef https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset/data
For open assistant, the code: https://github.com/LAION-AI/Open-Assistant/tree/main/inference
Project mention: Building in Public: Leveraging Tublian's AI Copilot for My Open Source Contributions | dev.to | 2024-02-12Contributing to Apache Airflow's open-source project immersed me in collaborative coding. Experienced maintainers rigorously reviewed my contributions, providing constructive feedback. This ongoing dialogue refined the codebase and honed my understanding of best practices.
A co-founder announced they disbanded their robots team a couple years ago: https://venturebeat.com/business/openai-disbands-its-robotic...
That was the same time they depreciated OpenAI Gym: https://github.com/openai/gym
Project mention: Can we discuss MLOps, Deployment, Optimizations, and Speed? | /r/LocalLLaMA | 2023-12-06DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.
Note that there are many tools that make this easier/simpler to prototype, including chainlit, streamlit, etc… The backend API we built is amenable to interacting with them as well.
22. Ray | Github | tutorial
Project mention: Show HN: Dropbase – Build internal web apps with just Python | news.ycombinator.com | 2023-12-05There's also that library all the AI models started using that gives you a public URL to share. After researching it: https://www.gradio.app/ is the link.
It's used specifically for making simple UIs for machine learning apps. But I guess technically you could use it for anything.
Project mention: Step by step guide to create customized chatbot by using spaCy (Python NLP library) | dev.to | 2024-03-23Hi Community, In this article, I will demonstrate below steps to create your own chatbot by using spaCy (spaCy is an open-source software library for advanced natural language processing, written in the programming languages Python and Cython):
Project mention: Lightning AI Studios – A persistent GPU cloud environment | news.ycombinator.com | 2023-12-14
Project mention: Open-Sourcing High-Frequency Trading and Market-Making Backtesting Tool | /r/Python | 2023-12-06You might want to suggest this as an extension to the OpenBB project - I imagine that could be of interest to them if there isn’t something like it built in already :-)
Project mention: The CEO of Ultralytics (yolov8) using LLMs to engage with commenters on GitHub | news.ycombinator.com | 2024-02-12Yep, I noticed this a while ago. It posts easily identifiable ChatGPT responses. It also posts garbage wrong answers which makes it worse than useless. Totally disrespectful to the userbase.
https://github.com/ultralytics/ultralytics/issues/5748#issue...
Project mention: Leveraging GPT-4 for PDF Data Extraction: A Comprehensive Guide | dev.to | 2023-12-27PyTesseract Module [ Github ] EasyOCR Module [ Github ] PaddlePaddle OCR [ Github ]
Python Machine Learning related posts
- Building an Email Assistant Application with Burr
- Voxel51 Is Hiring AI Researchers and Scientists — What the New Open Science Positions Mean
- Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations
- Show HN: I made a ROS package for realtime semantic segmentation
- How to Estimate Depth from a Single Image
- Maxtext: A simple, performant and scalable Jax LLM
- My Favorite DevTools to Build AI/ML Applications!
-
A note from our sponsor - InfluxDB
www.influxdata.com | 26 Apr 2024
Index
What are some of the best open-source Machine Learning projects in Python? This list will help you:
Project | Stars | |
---|---|---|
1 | transformers | 125,021 |
2 | Pytorch | 77,783 |
3 | Keras | 60,937 |
4 | scikit-learn | 58,046 |
5 | Face Recognition | 51,755 |
6 | faceswap | 49,178 |
7 | yolov5 | 46,921 |
8 | Open-Assistant | 36,622 |
9 | Airflow | 34,485 |
10 | gym | 33,873 |
11 | DeepSpeed | 32,550 |
12 | streamlit | 31,506 |
13 | Ray | 31,101 |
14 | gradio | 28,730 |
15 | spaCy | 28,704 |
16 | pytorch-lightning | 26,883 |
17 | data-science-ipython-notebooks | 26,459 |
18 | OpenBBTerminal | 26,055 |
19 | ML-From-Scratch | 23,164 |
20 | ultralytics | 22,624 |
21 | NLP-progress | 22,296 |
22 | EasyOCR | 21,882 |
23 | d2l-en | 21,628 |
Sponsored