FlexGen Alternatives

Similar projects and alternatives to FlexGen

stable-diffusion-webui

2,808 129,299 9.9 Python FlexGen VS stable-diffusion-webui

Stable Diffusion web UI
text-generation-webui

876 35,862 9.9 Python FlexGen VS text-generation-webui

A Gradio web UI for Large Language Models. Supports transformers, GPTQ, AWQ, EXL2, llama.cpp (GGUF), Llama models.
WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
llama.cpp

769 55,846 10.0 C++ FlexGen VS llama.cpp

LLM inference in C/C++
whisper

343 59,916 6.8 Python FlexGen VS whisper

Robust Speech Recognition via Large-Scale Weak Supervision
Open-Assistant

329 36,622 9.1 Python FlexGen VS Open-Assistant

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
whisper.cpp

187 30,942 9.8 C FlexGen VS whisper.cpp

Port of OpenAI's Whisper model in C/C++
llama

184 52,603 8.1 Python FlexGen VS llama

Inference code for Llama models
InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
autocomplete

164 24,265 9.6 TypeScript FlexGen VS autocomplete

IDE-style autocomplete for your existing terminal & shell
nebuly

105 8,367 8.4 Python FlexGen VS nebuly

The user analytics platform for LLMs
petals

98 8,661 8.5 Python FlexGen VS petals

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading
RWKV-LM

84 11,619 8.8 Python FlexGen VS RWKV-LM

RWKV is an RNN with transformer-level LLM performance. It can be directly trained like a GPT (parallelizable). So it's combining the best of RNN and transformer - great performance, fast inference, saves VRAM, fast training, "infinite" ctx_len, and free sentence embedding.
dalai

59 13,044 6.5 CSS FlexGen VS dalai

The simplest way to run LLaMA on your local machine
IOPaint

48 16,993 9.5 Python FlexGen VS IOPaint

Image inpainting tool powered by SOTA AI Model. Remove any unwanted object, defect, people from your pictures or erase and replace(powered by stable diffusion) any thing on your pictures.
serge

40 5,535 9.8 Svelte FlexGen VS serge

A web interface for chatting with Alpaca through llama.cpp. Fully dockerized, with an easy to use API.
whisper-asr-webservice

11 1,617 7.7 Python FlexGen VS whisper-asr-webservice

OpenAI Whisper ASR Webservice API
llama-cpu

9 775 3.1 Python FlexGen VS llama-cpu

Fork of Facebooks LLaMa model to run on CPU
minimal-llama

4 456 8.5 Python FlexGen VS minimal-llama
text-generation-inference

29 7,800 9.6 Python FlexGen VS text-generation-inference

Large Language Model Text Generation Inference
fickling

7 322 8.6 Python FlexGen VS fickling

A Python pickling decompiler and static analyzer
llama-int8

6 1,044 3.6 Python FlexGen VS llama-int8

Quantized inference code for LLaMA models
SaaSHub

www.saashub.com sponsored

SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a better FlexGen alternative or higher similarity.

Suggest an alternative to FlexGen

FlexGen reviews and mentions

Posts with mentions or reviews of FlexGen. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-12-03.

Run 70B LLM Inference on a Single 4GB GPU with This New Technique
3 projects | news.ycombinator.com | 3 Dec 2023
Colorful Custom RTX 4060 Ti GPU Clocks Outed, 8 GB VRAM Confirmed
1 project | /r/hardware | 17 Apr 2023
Local Alternatives of ChatGPT and Midjourney
17 projects | /r/selfhosted | 12 Apr 2023

LLaMA, Pythia, RWKV, Flan-T5 (self-hosted), FlexGen
FlexGen: Running large language models on a single GPU
1 project | /r/hypeurls | 26 Mar 2023

1 project | /r/patient_hackernews | 26 Mar 2023

1 project | /r/hackernews | 26 Mar 2023

4 projects | news.ycombinator.com | 25 Mar 2023
Show HN: Finetune LLaMA-7B on commodity GPUs using your own text
16 projects | news.ycombinator.com | 21 Mar 2023
> With no real knowledge of LLM and only recently started to understand what LLM terms mean, such as 'model, inference, LLM model, intruction set, fine tuning' whatelse do you think is required to make a took like yours?
This was mee a few weeks ago. I got interested in all this when FlexGen (https://github.com/FMInference/FlexGen) was announced, which allowed to run inference using OPT model on consumer hardware. I'm an avid user of Stable Diffusion, and I wanted to see if I can have an SD equivalent of ChatGPT.
Not understanding the details of hyperparameters or terminology, I basically asked ChatGPT to explain to me what these things are:
```
   Explain to someone who is a software engineer with limited knowledge of ML terms or linear algebra, what is "feed forward" and "self-attention" in the context of ML and large language models. Provide examples when possible.
```
Could this new flexgen be used in place of GPTq? or is this different?
1 project | /r/Oobabooga | 18 Mar 2023
OpenAI is expensive
2 projects | /r/GPT3 | 17 Mar 2023
A note from our sponsor - InfluxDB
www.influxdata.com | 25 Apr 2024

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality. Learn more →