What are the current fastest multi-gpu inference frameworks?

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

accelerate

18 6,996 9.7 Python

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support

So I rent a cloud server today to try out some of the recent LLMs like falcon and vicuna. I started with huggingface's generate API using accelerate. It got about 2 instances/s with 8 A100 40GB GPUs which I think is a bit slow. I was using batch size = 1 since I do not know how to do multi-batch inference using the .generate API. I did torch.compile + bf16 already. Do we have an even faster multi-gpu inference framework? I have 8 GPUs so I was thinking about MUCH faster speed like ~10 or 20 instances per second (or is it possible at all? I am pretty new to this field).

FastChat

83 33,877 9.6 Python

An open platform for training, serving, and evaluating large language models. Release repo for Vicuna and Chatbot Arena.

Vicuna has a FastChat, not sure how flexible it is to configure tho

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
ChatGLM-6B

17 39,341 8.4 Python

ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

ChatGLM seems to be pretty popular but I've never used this before.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

A smooth and sharp image interpolation you probably haven't heard of

2 projects | news.ycombinator.com | 2 May 2024
OpenAI Security Slack Bots

1 project | news.ycombinator.com | 2 May 2024
Building a Trader Bot with Sentiment Analysis: A Step-by-Step Guide

1 project | dev.to | 2 May 2024
How to Build in Public as a Tech Professional

2 projects | dev.to | 2 May 2024
Agents of Change: Navigating the Rise of AI Agents in 2024

8 projects | dev.to | 2 May 2024