serving
maturin
Our great sponsors
serving | maturin | |
---|---|---|
12 | 37 | |
6,071 | 3,232 | |
0.2% | 5.8% | |
9.8 | 9.4 | |
about 18 hours ago | 6 days ago | |
C++ | Rust | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
serving
-
Llama.cpp: Full CUDA GPU Acceleration
Yet another TEDIOUS BATTLE: Python vs. C++/C stack.
This project gained popularity due to the HIGH DEMAND for running large models with 1B+ parameters, like `llama`. Python dominates the interface and training ecosystem, but prior to llama.cpp, non-ML professionals showed little interest in a fast C++ interface library. While existing solutions like tensorflow-serving [1] in C++ were sufficiently fast with GPU support, llama.cpp took the initiative to optimize for CPU and trim unnecessary code, essentially code-golfing and sacrificing some algorithm correctness for improved performance, which isn't favored by "ML research".
NOTE: In my opinion, a true pioneer was DarkNet, which implemented the YOLO model series and significantly outperformed others [2]. Same trick basically like llama.cpp
[1] https://github.com/tensorflow/serving
-
[D] How do OpenAI and other companies manage to have real-time inference on model with billions of parameters over an API?
I mean, probably - it's written in C++ https://github.com/tensorflow/serving
-
Should I wait for the M2 Macbook Pro?
We’re looking into that solution at the moment, the issue I’m referring to is related to this https://github.com/tensorflow/serving/issues/1948 we’ll know if the plug-in approach works for our uses soon but haven’t started looking into implementing it yet
- TF Serving has been unavailable for 9 days so far due to outdated GPG key
- TF Serving has been unavailable for 8 days
-
Would you use maturin for ML model serving?
Which ML framework do you use? Tensorflow has https://github.com/tensorflow/serving. You could also use the Rust bindings to load a saved model and expose it using one of the Rust HTTP servers. It doesn't matter whether you trained your model in Python as long as you export its saved model.
-
Is LaMDA Sentient? – An Interview [pdf]
Most likely it's a model server running something like https://github.com/tensorflow/serving and if there isn't a lot of load, the resource could kill some of its tasks. I wouldn't imagine it's sitting around pondering deep thoughts.
-
Ask HN: How to deploy a TensorFlow model for access through an HTTP endpoint?
https://github.com/tensorflow/serving
https://thenewstack.io/tutorial-deploying-tensorflow-models-...
-
Popular Machine Learning Deployment Tools
GitHub
-
If data science uses a lot of computational power, then why is python the most used programming language?
You serve models via https://www.tensorflow.org/tfx/guide/serving which is written entirely in C++ (https://github.com/tensorflow/serving/tree/master/tensorflow_serving/model_servers), no Python on the serving path or in the shipped product.
maturin
-
In Rust for Python: A Match from Heaven
This story unfolds as a captivating journey where the agile Flounder, representing the Python programming language, navigates the vast seas of coding under the wise guidance of Sebastian, symbolizing Rust. Central to their adventure are three powerful tridents: cargo, PyO3, and maturin.
-
Feedback from calling Rust from Python
-- Maturin on GitHub
-
Some Reasons to Avoid Cython
My new favorite way to write very fast libraries for Python is to just use Rust and Maturin:
https://github.com/PyO3/maturin
It basically automates everything for you. If you use it with Github actions, it will compile wheels for you on each release for every platform and python version you want, and even upload them to PyPi (pip) for you. Everything feels very modern and well thought out. People really care about good tooling in the Rust world.
-
Which programming language to focus on for my PhD journey in bioinformatics?
Python first, you will be able to experiment quickly with the notebooks. Then maybe write (or rewrite) some modules in Rust that you can expose as python modules, with py03 and maturin. Feel free to publish useful packages on both crates.io and pypi.org, so you can contribute to Python and Rust ecosystems.
-
python to rust migration
Now if you really want to use Rust, you can rewrite only the part that are slowing down your consumer. It's easy by using Py03 and maturin. Maybe also rayon to parallelize.
-
Ask HN: Is it worth it for me to learn Go or Rust as a Data Engineer?
It's relatively easy to extend Python with project like Py03[0] and Maturin[1]. Polars[2] is the perfect example of that.
It's not easy to push coworkers/companies to use an unfamiliar language. Rust isn't fast to learn. You need very good arguments and a good usecase to make it works.
I doubt that learning Rust will help you more that learning more about the data engineers tools, so this isn't really "worth" your time.
[0] -- https://pyo3.rs/v0.18.3/
[1] -- https://github.com/PyO3/maturin
[2] -- https://www.pola.rs/
- Rust CLI app installable via PIP?
-
Blog Post: Making Python 100x faster with less than 100 lines of Rust
In this case, PyO3/maturin does all the setup and getting the module into Python. They also have docs going into a lot more depth on this.
-
Is Rust faster than Python out of the box
Lastly if you're willing to introduce Rust, I'd consider a gradual approach using native libraries built in rust with PYO3. Check the maturin guide that helps you to streamline the build process of native libraries : https://github.com/PyO3/maturin . From there you could try to find hotspots in your python app and replace those with a native implementation.
- sccache now supports GHA as backend
What are some alternatives?
server - The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Poetry - Python packaging and dependency management made easy
MNN - MNN is a blazing fast, lightweight deep learning framework, battle-tested by business-critical use cases in Alibaba
setuptools-rust - Setuptools plugin for Rust support
flashlight - A C++ standalone library for machine learning
termux-packaging - Termux packaging tools.
XLA.jl - Julia on TPUs
PyOxidizer - A modern Python application packaging and distribution tool
oneflow - OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.
rust-numpy - PyO3-based Rust bindings of the NumPy C-API
glow - Compiler for Neural Network hardware accelerators
pybind11 - Seamless operability between C++11 and Python