-
mosec
A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine
-
inference-benchmark
Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
- Yes C++ would be better, but you can try mosec. It has a Python interface and helps you handle all the difficult things about Python multiprocessing. The web service part is implemented in Rust thus it's fast enough for machine learning services.
I have done some benchmarks before: https://github.com/tensorchord/inference-benchmark
Look at something like https://github.com/huggingface/text-generation-inference to get an idea of how to do this (they use a lot of optimizations)... or just use/adapt the project if you want to instead
Related posts
-
[P] What are the latest "out of the box solutions" for deploying the very large LLMs as API endpoints?
-
SB-1047 will stifle open-source AI and decrease safety
-
Getting Started with Gemma Models
-
Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat
-
Show HN: Sonauto – a more controllable AI music creator