[D] When chatGPT stops being free: Run SOTA LLM in cloud

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

DeepSpeed-MII

6 1,629 8.7 Python

MII makes low-latency and high-throughput inference possible, powered by DeepSpeed.

Microsoft/DeepSpeed-MII for an up 40x reduction on inference cost on Azure, this thing also supports int8 and fp16 bloom out of the box, but it fails on Azure due to instance size.

xformers

46 7,578 9.3 Python

Hackable and optimized Transformers building blocks, supporting a composable construction.

facebook/xformer not sure, but if I remember correctly this brought inference requirements down to 4GB vRAM for StableDiffusion and DreamBooth fine-tuning to 10GB. No idea if this is usefull for Bloom(z) inference cost reduction though

InfluxDB

www.influxdata.com sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
petals

98 8,661 8.5 Python

🌸 Run LLMs at home, BitTorrent-style. Fine-tuning and inference up to 10x faster than offloading

Another option is to work with/contribute to a distributed implementation of large language models. The Petals project is running BLOOM over a decentralized network of small workers (min 8GB VRAM requirement)

Open-Assistant

329 36,647 8.3 Python

OpenAssistant is a chat-based assistant that understands tasks, can interact with third-party systems, and retrieve information dynamically to do so.

Update: Found LAION-AI/OPEN-ASSISTANT a very promising project opensourcing the idea of chatGPT. video here

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project