And Here..We..Go: Running large language models like ChatGPTon a single GPU. Up to 100x faster than other offloading systems

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

FlexGen

19 5,350 10.0 Python

Discontinued Running large language models like OPT-175B/GPT-3 on a single GPU. Focusing on high-throughput generation. [Moved to: https://github.com/FMInference/FlexGen] (by Ying1123)
Open-Assistant

329 36,622 9.1 Python

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

This is awesome! Hopefully this helps accelerate the creation of Open Assistant. (It's an open source large language model in the works)

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Run 70B LLM Inference on a Single 4GB GPU with This New Technique
3 projects | news.ycombinator.com | 3 Dec 2023
Colorful Custom RTX 4060 Ti GPU Clocks Outed, 8 GB VRAM Confirmed
1 project | /r/hardware | 17 Apr 2023
FlexGen: Running large language models on a single GPU
1 project | /r/hypeurls | 26 Mar 2023
FlexGen: Running large language models on a single GPU
1 project | /r/patient_hackernews | 26 Mar 2023
FlexGen: Running large language models on a single GPU
1 project | /r/hackernews | 26 Mar 2023

And Here..We..Go: Running large language models like ChatGPTon a single GPU. Up to 100x faster than other offloading systems

This page summarizes the projects mentioned and recommended in the original post on /r/singularity
chatgpt Deep Learning gpt-3 high-throughput large-language-models
Post date: 20 Feb 2023

FlexGen

Open-Assistant

WorkOS

Related posts

And Here..We..Go: Running large language models like ChatGPTon a single GPU. Up to 100x faster than other offloading systems

This page summarizes the projects mentioned and recommended in the original post on /r/singularity chatgpt Deep Learning gpt-3 high-throughput large-language-models Post date: 20 Feb 2023

FlexGen

Open-Assistant

WorkOS

Related posts

This page summarizes the projects mentioned and recommended in the original post on /r/singularity
chatgpt Deep Learning gpt-3 high-throughput large-language-models
Post date: 20 Feb 2023