Running large language models like ChatGPT on a single GPU

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

FlexGen

19 5,350 10.0 Python

Discontinued Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput generation. [Moved to: https://github.com/FMInference/FlexGen] (by Ying1123)
ggml

69 9,642 9.8 C

Tensor library for machine learning

I don't know about these large models but I saw on a random HN comment earlier in a different topic where someone showed a GPT-J model on CPU only: https://github.com/ggerganov/ggml
I tested it on my Linux and Macbook M1 Air and it generates tokens at a reasonable speed using CPU only. I noticed it doesn't quite use all my available CPU cores so it may be leaving some performance on the table, not sure though.
The GPT-J 6B is nowhere near as large as the OPT-175B in the post. But I got the sense that CPU-only inference may not be totally hopeless even for large models if only we got some high quality software to do it.

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
rust-bert

7 2,418 6.8 Rust

Rust native ready-to-use NLP pipelines and transformer-based models (BERT, DistilBERT, GPT2,...)

Give this a look: https://github.com/guillaume-be/rust-bert
If you have Pytorch configured correctly, this should "just work" for a lot of the smaller models. It won't be a 1:1 ChatGPT replacement, but you can build some pretty cool stuff with it.
> it's basically Python or bust in this space
More or less, but that doesn't have to be a bad thing. If you're on Apple Silicon, you have plenty of performance headroom to deploy Python code for this. I've gotten this library to work on systems with as little as 2gb of memory, so outside of ultra-low-end use cases, you should be fine.

CTranslate2

13 2,799 8.9 C++

Fast inference engine for Transformer models
Open-Assistant

329 36,622 9.1 Python

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.
stable-horde-notebook

1 5 10.0 Jupyter Notebook

A Jupyter notebook for Stable Horde, for use in Google Colab, etc.

https://github.com/aqualxx/stable-horde-notebook
My only problem with stable horde is that their anti-cp measure involves checking the prompt for words like small, meaning I can't use a nsfw-capable model with certain prompts (holding a very small bag, etc). That, and seeing great things in the image rating and being unable to reproduce because it doesn't provide the prompt.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Scikit-LLM seamlessly integrate LLMs into scikit-learn
1 project | news.ycombinator.com | 25 Aug 2023
Minigpt4 Inference on CPU
1 project | /r/hypeurls | 21 Jul 2023
Minigpt4 Inference on CPU
2 projects | news.ycombinator.com | 19 Jul 2023
Scikit-LLM: Sklearn Meets Large Language Models
1 project | news.ycombinator.com | 23 May 2023
Scikit-LLM: Sklearn Meets Large Language Models
1 project | news.ycombinator.com | 22 May 2023

Running large language models like ChatGPT on a single GPU

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Deep Learning Machine Learning neural-machine-translation chatgpt CPP
Post date: 20 Feb 2023

FlexGen

ggml

InfluxDB

rust-bert

CTranslate2

Open-Assistant

stable-horde-notebook

Related posts

Running large language models like ChatGPT on a single GPU

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Deep Learning Machine Learning neural-machine-translation chatgpt CPP Post date: 20 Feb 2023

FlexGen

ggml

InfluxDB

rust-bert

CTranslate2

Open-Assistant

stable-horde-notebook

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Deep Learning Machine Learning neural-machine-translation chatgpt CPP
Post date: 20 Feb 2023