[D] Handling Concurrent Request for ML Model API

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

mosec

11 707 8.5 Python

A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

- Yes C++ would be better, but you can try mosec. It has a Python interface and helps you handle all the difficult things about Python multiprocessing. The web service part is implemented in Rust thus it's fast enough for machine learning services.

inference-benchmark

1 26 6.4 Python

Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)

I have done some benchmarks before: https://github.com/tensorchord/inference-benchmark

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
text-generation-inference

29 7,881 9.6 Python

Large Language Model Text Generation Inference

Look at something like https://github.com/huggingface/text-generation-inference to get an idea of how to do this (they use a lot of optimizations)... or just use/adapt the project if you want to instead

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

[P] What are the latest "out of the box solutions" for deploying the very large LLMs as API endpoints?

3 projects | /r/MachineLearning | 23 Feb 2023
SB-1047 will stifle open-source AI and decrease safety

2 projects | news.ycombinator.com | 29 Apr 2024
Getting Started with Gemma Models

4 projects | dev.to | 15 Apr 2024
Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat

3 projects | news.ycombinator.com | 12 Apr 2024
Show HN: Sonauto – a more controllable AI music creator

1 project | news.ycombinator.com | 10 Apr 2024

[D] Handling Concurrent Request for ML Model API

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Pytorch model-serving Bloom Benchmark Deep Learning
Post date: 5 Jul 2023

mosec

inference-benchmark

InfluxDB

text-generation-inference

Related posts

[P] What are the latest "out of the box solutions" for deploying the very large LLMs as API endpoints?

SB-1047 will stifle open-source AI and decrease safety

Getting Started with Gemma Models

Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat

Show HN: Sonauto – a more controllable AI music creator

[D] Handling Concurrent Request for ML Model API

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Pytorch model-serving Bloom Benchmark Deep Learning Post date: 5 Jul 2023

mosec

inference-benchmark

InfluxDB

text-generation-inference

Related posts

[P] What are the latest "out of the box solutions" for deploying the very large LLMs as API endpoints?

SB-1047 will stifle open-source AI and decrease safety

Getting Started with Gemma Models

Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat

Show HN: Sonauto – a more controllable AI music creator

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Pytorch model-serving Bloom Benchmark Deep Learning
Post date: 5 Jul 2023