-
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
accelerate is a best-in-class lib for deploying models, especially across multi-gpu and multi-node.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
transformers uses accelerate if you call it with device_map='auto'
-
The unsloth project offers some low-level optimizations for Llama et al, and as of today some prelim Mistral work (which I heard is the llama architecture?)
-
llama.cpp is a great resource for running Quants, and even though it's called llama, it's the goto backend for basically all LLMs right now (ctransformers is dead)
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.
-
-
I recently went through the same with UniteAI, and had to swap ctransformers back out for llama.cpp