[D] Handling Concurrent Request for ML Model API

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • mosec

    A high-performance ML model serving framework, offers dynamic batching and CPU/GPU pipelines to fully exploit your compute machine

  • - Yes C++ would be better, but you can try mosec. It has a Python interface and helps you handle all the difficult things about Python multiprocessing. The web service part is implemented in Rust thus it's fast enough for machine learning services.

  • inference-benchmark

    Benchmark for machine learning model online serving (LLM, embedding, Stable-Diffusion, Whisper)

  • I have done some benchmarks before: https://github.com/tensorchord/inference-benchmark

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • text-generation-inference

    Large Language Model Text Generation Inference

  • Look at something like https://github.com/huggingface/text-generation-inference to get an idea of how to do this (they use a lot of optimizations)... or just use/adapt the project if you want to instead

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • [P] What are the latest "out of the box solutions" for deploying the very large LLMs as API endpoints?

    3 projects | /r/MachineLearning | 23 Feb 2023
  • SB-1047 will stifle open-source AI and decrease safety

    2 projects | news.ycombinator.com | 29 Apr 2024
  • Getting Started with Gemma Models

    4 projects | dev.to | 15 Apr 2024
  • Zephyr 141B, a Mixtral 8x22B fine-tune, is now available in Hugging Chat

    3 projects | news.ycombinator.com | 12 Apr 2024
  • Show HN: Sonauto – a more controllable AI music creator

    1 project | news.ycombinator.com | 10 Apr 2024