-
accelerate
🚀 A simple way to launch, train, and use PyTorch models on almost any device and distributed configuration, automatic mixed precision (including fp8), and easy-to-configure FSDP and DeepSpeed support
-
Scout Monitoring
Free Django app performance insights with Scout Monitoring. Get Scout setup in minutes, and let us sweat the small stuff. A couple lines in settings.py is all you need to start monitoring your apps. Sign up for our free tier today.
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
accelerate is a best-in-class lib for deploying models, especially across multi-gpu and multi-node.
transformers uses accelerate if you call it with device_map='auto'
The unsloth project offers some low-level optimizations for Llama et al, and as of today some prelim Mistral work (which I heard is the llama architecture?)
llama.cpp is a great resource for running Quants, and even though it's called llama, it's the goto backend for basically all LLMs right now (ctransformers is dead)
DeepSpeed can handle parallelism concerns, and even offload data/model to RAM, or even NVMe (!?) . I'm surprised I don't see this project used more.
I recently went through the same with UniteAI, and had to swap ctransformers back out for llama.cpp