SaaSHub helps you find the best software and product alternatives Learn more →
Top 23 ML Open-Source Projects
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
dopamine
Dopamine is a research framework for fast prototyping of reinforcement learning algorithms.
-
MNN
MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
-
deeplake
Database for AI. Store Vectors, Images, Texts, Videos, etc. Use with LLMs/LangChain. Store, query, version, & visualize any AI data. Stream data in real-time to PyTorch/TensorFlow. https://activeloop.ai
-
unstructured
Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
# L2-normalize the encoding tensors image_encoding = tf.math.l2_normalize(image_encoding, axis=1) audio_encoding = tf.math.l2_normalize(audio_encoding, axis=1) # Find euclidean distance between image_encoding and audio_encoding # Essentially trying to detect if the face is saying the audio # Will return nan without the 1e-12 offset due to https://github.com/tensorflow/tensorflow/issues/12071 d = tf.norm((image_encoding - audio_encoding) + 1e-12, ord='euclidean', axis=1, keepdims=True) discriminator = keras.Model(inputs=[image_input, audio_input], outputs=[d], name="discriminator")
- https://github.com/microsoft/ML-For-Beginners
Also check out this list Pitt puts out every year:
Ref https://www.youtube.com/watch?v=0GwnxFNfZhM https://github.com/ultralytics/yolov5 https://dev.to/gfstealer666/kaaraich-yolo-alkrithuemainkaartrwcchcchabwatthu-object-detection-3lef https://www.kaggle.com/datasets/devdgohil/the-oxfordiiit-pet-dataset/data
They're data-dependence graphs for a neural-network scheduling problem. Like this but way bigger to start with and then lowered to more detailed representations several times: https://netron.app/?url=https://github.com/onnx/models/raw/m... My home-grown layout engine can handle the 12k nodes for llama2 in its highest-level form in 20s or so, but its not the most featureful, and they only get bigger from there. So I always have an eye out for potential tools.
Project mention: What’s the Difference Between Fine-tuning, Retraining, and RAG? | dev.to | 2024-04-08Check us out on GitHub.
Project mention: Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations | dev.to | 2024-04-26How can this be? The current state of practice in AI/ML work requires adaptivity, which is uncommon in classical computational fields. There are myriad tools that capture the work across the many instances of the AI/ML lifecycle. The idea that any one tool could sufficiently capture the dynamic work is unrealistic. Take, for example, an experiment tracking tool like W&B or MLFlow; some form of experiment tracking is necessary in typical model training lifecycles. Such a tool requires some notion of a dataset. However, a tool focusing on experiment tracking is orthogonal to the needs of analyzing model performance at the data sample level, which is critical to understanding the failure modes of models. The way one does this depends on the type of data and the AI/ML task at hand. In other words, MLOps is inherently an intricate mosaic, as the capabilities and best practices of AI/ML work evolve.
Project mention: The Era of 1-bit LLMs: ternary parameters for cost-effective computing | news.ycombinator.com | 2024-02-28https://github.com/Stability-AI/StableLM?tab=readme-ov-file#...
Project mention: Show HN: Toolkit for LLM Fine-Tuning, Ablating and Testing | news.ycombinator.com | 2024-04-07This is a great project, little bit similar to https://github.com/ludwig-ai/ludwig, but it includes testing capabilities and ablation.
questions regarding the LLM testing aspect: How extensive is the test coverage for LLM use cases, and what is the current state of this project area? Do you offer any guarantees, or is it considered an open-ended problem?
Would love to see more progress toward this area!
Project mention: [D][R] Deploying deep models on memory constrained devices | /r/MachineLearning | 2023-10-03However, I am looking on this subject through the problem of training/finetuning deep models on the edge devices, being increasingly available thing to do. Looking at tflite, alibaba's MNN, mit-han-lab's tinyengine etc..
Be careful with unstructured:
https://github.com/Unstructured-IO/unstructured/blob/d11c70c...
from: https://github.com/open-webui/open-webui/issues/687
Yet another TEDIOUS BATTLE: Python vs. C++/C stack.
This project gained popularity due to the HIGH DEMAND for running large models with 1B+ parameters, like `llama`. Python dominates the interface and training ecosystem, but prior to llama.cpp, non-ML professionals showed little interest in a fast C++ interface library. While existing solutions like tensorflow-serving [1] in C++ were sufficiently fast with GPU support, llama.cpp took the initiative to optimize for CPU and trim unnecessary code, essentially code-golfing and sacrificing some algorithm correctness for improved performance, which isn't favored by "ML research".
NOTE: In my opinion, a true pioneer was DarkNet, which implemented the YOLO model series and significantly outperformed others [2]. Same trick basically like llama.cpp
[1] https://github.com/tensorflow/serving
Project mention: Open-sourcing a simple automation/agent workflow builder | /r/ChatGPTPro | 2023-10-07We're open-sourcing a project that lets you build simple automations/agent workflows that use LLMs for different tasks. Kinda like Zapier or IFTTT but focused on using natural language to accomplish your tasks.It's super early but we'd love to start getting feedback to steer it in the right direction. It currently supports OpenAI and local models through llm.
ML related posts
-
Show HN: LLM-powered NPCs running on your hardware
-
Observations on MLOps–A Fragmented Mosaic of Mismatched Expectations
-
Machine Learning with PHP
-
Show HN: Open-source Google Docs for audio transcriptions (Whisper)
-
What’s the Difference Between Fine-tuning, Retraining, and RAG?
-
W3C discussions of impact of ML models on the web
-
Why do tree-based models still outperform deep learning on tabular data? (2022)
-
A note from our sponsor - SaaSHub
www.saashub.com | 10 May 2024
Index
What are some of the best open-source ML projects? This list will help you:
Project | Stars | |
---|---|---|
1 | tensorflow | 182,693 |
2 | ML-For-Beginners | 67,111 |
3 | yolov5 | 47,202 |
4 | netron | 26,174 |
5 | handson-ml | 25,094 |
6 | MindsDB | 21,354 |
7 | MLflow | 17,335 |
8 | StableLM | 15,851 |
9 | best-of-ml-python | 15,633 |
10 | kubeflow | 13,700 |
11 | awesome-mlops | 11,769 |
12 | ludwig | 10,845 |
13 | dopamine | 10,378 |
14 | ML.NET | 8,855 |
15 | pycaret | 8,450 |
16 | MNN | 8,325 |
17 | deeplake | 7,729 |
18 | metaflow | 7,630 |
19 | unstructured | 6,515 |
20 | CoreML-Models | 6,241 |
21 | serving | 6,085 |
22 | llm | 5,931 |
23 | oneflow | 5,731 |
Sponsored