FlexGen vs accelerate

FlexGen

Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput generation. [Moved to: https://github.com/FMInference/FlexGen] (by Ying1123)

DISCONTINUED

Suggest alternative

Edit details

🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support (by huggingface)

Suggest topics

Source Code

huggingface.co

Suggest alternative

Edit details

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

FlexGen		accelerate
	Project
19	Mentions	18
5,350	Stars	6,948
-	Growth	5.0%
10.0	Activity	9.7
about 1 year ago	Latest Commit	4 days ago
Python	Language	Python
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

FlexGen

Posts with mentions or reviews of FlexGen. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-03-16.

Training LLaMA-65B with Stanford Code
3 projects | /r/Oobabooga | 16 Mar 2023

#1: Progress Update | 4 comments #2: the default UI on the pinned Google Colab is buggy so I made my own frontend - YAFFOA. | 18 comments #3: Paper reduces resource requirement of a 175B model down to 16GB GPU | 19 comments
Replika users fell in love with their AI chatbot companions. Then they lost them
2 projects | news.ycombinator.com | 2 Mar 2023

It's really just a gpu vram limitation: affordable GPUs are rather memory starved.
Fortunately people have started writing implementations for pipelining across multiple gpus.
https://github.com/Ying1123/FlexGen
Same as with Stable Diffusion, new AI based LAION, are coming up slowly but surely: Paper reduces resource requirement of a 175B model down to 16GB GPU
1 project | /r/StableDiffusion | 21 Feb 2023
And Here..We..Go: Running large language models like ChatGPTon a single GPU. Up to 100x faster than other offloading systems
1 project | /r/Newsoku_L | 21 Feb 2023
When, how and why will this Stable Diffusion spring stop?
2 projects | /r/StableDiffusion | 20 Feb 2023

Actually there's a solution : read this paper https://github.com/Ying1123/FlexGen/blob/main/docs/paper.pdf
Exciting new shit.
3 projects | /r/PygmalionAI | 20 Feb 2023

Flexgen - Run big models on your small GPU https://github.com/Ying1123/FlexGen
Paper reduces resource requirement of a 175B model down to 16GB GPU
2 projects | /r/ChatGPTforall | 20 Feb 2023
FlexGen - Run 175B Parameter Models on consumer hardware
1 project | /r/ChatGPT | 20 Feb 2023
Running large language models like ChatGPT on a single GPU
1 project | /r/patient_hackernews | 20 Feb 2023
FlexGen: Running large language models like ChatGPT/GPT-3/OPT-175B on a single GPU
1 project | /r/mlscaling | 20 Feb 2023

accelerate

Posts with mentions or reviews of accelerate. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-06.

Can we discuss MLOps, Deployment, Optimizations, and Speed?
7 projects | /r/LocalLLaMA | 6 Dec 2023

accelerate is a best-in-class lib for deploying models, especially across multi-gpu and multi-node.
Code Llama - The Hugging Face Edition
3 projects | /r/LocalLLaMA | 27 Aug 2023

In the coming days, we'll work on sharing scripts to train models, optimizations for on-device inference, even nicer demos (and for more powerful models), and more. Feel free to like our GitHub repos (transformers, peft, accelerate). Enjoy!
What are the current fastest multi-gpu inference frameworks?
3 projects | /r/LocalLLaMA | 25 Jun 2023

So I rent a cloud server today to try out some of the recent LLMs like falcon and vicuna. I started with huggingface's generate API using accelerate. It got about 2 instances/s with 8 A100 40GB GPUs which I think is a bit slow. I was using batch size = 1 since I do not know how to do multi-batch inference using the .generate API. I did torch.compile + bf16 already. Do we have an even faster multi-gpu inference framework? I have 8 GPUs so I was thinking about MUCH faster speed like ~10 or 20 instances per second (or is it possible at all? I am pretty new to this field).
Looking at lefnire's suggestion of splitting huggingface batches per gradient_accumulation_steps
1 project | /r/BuildThisAI | 16 Jun 2023

Looking through https://github.com/huggingface/accelerate/tree/main/src/accelerate/utils/ I think it might be feasible, but will require some modifications to:
Have to abandon my (almost) finished LLaMA-API-Inference server. If anybody finds it useful and wants to continue, the repo is yours. :)
3 projects | /r/LocalLLaMA | 18 May 2023

As /u/RabbitHole32 already mentioned, the speed increase stems from a patch which modifies, how a certain, large tensor is distributed between the GPU's. The patch was created by /u/emvw7yf. Here you can find the respective GitHub issue: https://github.com/huggingface/accelerate/issues/1394
Help please! SD installation broken
1 project | /r/SDtechsupport | 3 Apr 2023

::pip install git+https://github.com/huggingface/accelerate
Batch Controlnet
1 project | /r/StableDiffusion | 8 Mar 2023

pip install controlnet_aux pip install diffusers transformers git+https://github.com/huggingface/accelerate.git
[D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM
4 projects | /r/MachineLearning | 20 Feb 2023

Try to use both GPUs with this one: https://github.com/huggingface/accelerate https://huggingface.co/docs/accelerate/usage_guides/big_modeling https://huggingface.co/blog/accelerate-large-models Maybe it will help (the last link is clearer IMHO).
Fine Tuning Stable Diffusion with Dreambooth from Within My Python Code
1 project | /r/StableDiffusion | 19 Jan 2023

I read through this page on accelerate, but it's not clear to me how the arguments such as instance_prompt gets passed in.
What does ACCELERATE do in AUTOMATIC1111?
3 projects | /r/StableDiffusion | 20 Nov 2022

To activate it you have to uncomment webui-user.sh line 44 and adding set ACCELERATE="True" to webui-user.bat. It seems to use huggingface/accelerate (Microsoft DeepSpeed, ZeRO paper) ACCELERATE

What are some alternatives?

When comparing FlexGen and accelerate you can also consider the following projects:

text-generation-webui - A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.

DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

CTranslate2 - Fast inference engine for Transformer models

bitsandbytes - Accessible large language models via k-bit quantization for PyTorch.

ggml - Tensor library for machine learning

horovod - Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

rust-bert - Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

ChatGLM-6B - ChatGLM-6B: An Open Bilingual Dialogue Language Model | 开源双语对话语言模型

stanford_alpaca - Code and documentation to train Stanford's Alpaca models, and generate the data.

unsloth - Finetune Llama 3, Mistral & Gemma LLMs 2-5x faster with 80% less memory

stable-diffusion-webui - Stable Diffusion web UI

FlexGen vs text-generation-webui accelerate vs DeepSpeed FlexGen vs CTranslate2 accelerate vs bitsandbytes FlexGen vs ggml accelerate vs horovod FlexGen vs rust-bert accelerate vs ChatGLM-6B FlexGen vs stanford_alpaca accelerate vs unsloth FlexGen vs bitsandbytes accelerate vs stable-diffusion-webui

Compare FlexGen vs accelerate and see what are their differences.

FlexGen

accelerate

FlexGen

accelerate

What are some alternatives?