RHO-Loss
DeepSpeed
Our great sponsors
RHO-Loss | DeepSpeed | |
---|---|---|
1 | 25 | |
143 | 9,536 | |
3.5% | 9.6% | |
5.4 | 9.8 | |
5 months ago | 6 days ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
RHO-Loss
-
[D] Most important AI Paper´s this year so far in my opinion + Proto AGI speculation at the end
RHO-LOSS - Prioritized Training on Points that are Learnable, Worth Learning, and Not Yet Learnt - Trains Models 18x faster with higher accuracy Paper: https://arxiv.org/abs/2206.07137 Github: https://github.com/OATML/RHO-Loss
DeepSpeed
-
Apple: Transformer architecture optimized for Apple Silicon
I'm following this closely, together with other efforts like GPTQ Quantization and Microsoft's DeepSpeed, all of which are bringing down the hardware requirements of these advanced AI models.
-
Facebook LLAMA is being openly distributed via torrents
- https://github.com/microsoft/DeepSpeed
Anything that could bring this to a 10GB 3080 or 24GB 3090 without 60s/it per token?
-
Fine-tuning?
Git Clone the deepspeed repo, https://github.com/microsoft/DeepSpeed We need this to finetune without using more vram than any consumer model has. Build Deepspeed, in the deepspeed directory, DS_BUILD_OPS=1 DS_BUILD_AIO=0 pip install .
-
39.7 it/s with a 4090 on Linux!
I tried installing PyTorch 2.0.0, with triton from here microsoft/DeepSpeed#2694, compiling my own xformers and it made my inference even slower. From 17-18it/s 512x512, Batch size: 1, any sampling method to around 16-17it/s but especially with Batch size: 8, from 5.65it/s to 4.66it/s.
-
What does ACCELERATE do in AUTOMATIC1111?
To activate it you have to uncomment webui-user.sh line 44 and adding set ACCELERATE="True" to webui-user.bat. It seems to use huggingface/accelerate (Microsoft DeepSpeed, ZeRO paper) ACCELERATE
- New (simple) Dreambooth method is out, train under 10 minutes without class images on multiple subjects, retrainable-ish model
-
[D] Most important AI Paper´s this year so far in my opinion + Proto AGI speculation at the end
DeepSpeed Inference: Enabling Efficient Inference of Transformer Models at Unprecedented Scale - Microsoft 2022 Paper: https://arxiv.org/pdf/2207.00032.pdf Github: https://github.com/microsoft/DeepSpeed
-
[D] Does someone know how much faster deepspeed's transformer implementation is?
Implementation here
-
Nvidia Fiscal Q3 2022 Financial Result
Described a collaboration involving NVIDIA Megatron-LM and Microsoft DeepSpeed to create an efficient, scalable, 3D parallel system capable of combining data, pipeline and tensor-slicing-based parallelism.
What are some alternatives?
ColossalAI - Making large AI models cheaper, faster and more accessible
fairscale - PyTorch extensions for high performance and large scale training.
fairseq - Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
TensorRT - NVIDIA® TensorRT™, an SDK for high-performance deep learning inference, includes a deep learning inference optimizer and runtime that delivers low latency and high throughput for inference applications.
mesh-transformer-jax - Model parallel transformers in JAX and Haiku
Megatron-LM - Ongoing research training transformer models at scale
gpt-neox - An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.
server - The Triton Inference Server provides an optimized cloud and edge inferencing solution.
Finetune_LLMs - Repo for fine-tuning GPTJ and other GPT models
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
pytorch-forecasting - Time series forecasting with PyTorch
llama - Inference code for LLaMA models