Our great sponsors
-
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
WorkOS
The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
accelerate is a best-in-class lib for deploying models, especially across multi-gpu and multi-node.
transformers uses accelerate if you call it with device_map='auto'
The unsloth project offers some low-level optimizations for Llama et al, and as of today some prelim Mistral work (which I heard is the llama architecture?)
llama.cpp is a great resource for running Quants, and even though it's called llama, it's the goto backend for basically all LLMs right now (ctransformers is dead)
DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.
I recently went through the same with UniteAI, and had to swap ctransformers back out for llama.cpp