idist-snippets
xla
idist-snippets | xla | |
---|---|---|
1 | 8 | |
4 | 2,296 | |
- | 1.7% | |
0.0 | 9.9 | |
almost 3 years ago | 3 days ago | |
Python | C++ | |
- | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
idist-snippets
-
Distributed Training Made Easy with PyTorch-Ignite
Code snippets, as well as commands for running all the scripts, are provided in a separate repository.
xla
-
Who uses Google TPUs for inference in production?
> The PyTorch/XLA Team at Google
Meanwhile you have an issue from 5 years ago with 0 support
https://github.com/pytorch/xla/issues/202
-
Google TPU v5p beats Nvidia H100
PyTorch has had an XLA backend for years. I don't know how performant it is though. https://pytorch.org/xla
-
Why Did Google Brain Exist?
It's curtains for XLA, to be precise. And PyTorch officially supports XLA backend nowadays too ([1]), which kind of makes JAX and PyTorch standing on the same foundation.
1. https://github.com/pytorch/xla
-
Accelerating AI inference?
Pytorch supports other kinds of accelerators (e.g. FPGA, and https://github.com/pytorch/glow), but unless you want to become a ML systems engineer and have money and time to throw away, or a business case to fund it, it is not worth it. In general, both pytorch and tensorflow have hardware abstractions that will compile down to device code. (XLA, https://github.com/pytorch/xla, https://github.com/pytorch/glow). TPUs and GPUs have very different strengths; so getting top performance requires a lot of manual optimizations. Considering the the cost of training LLM, it is time well spent.
-
[D] Colab TPU low performance
While apparently TPUs can theoretically achieve great speedups, getting to the point where they beat a single GPU requires a lot of fiddling around and debugging. A specific setup is required to make it work properly. E.g., here it says that to exploit TPUs you might need a better CPU to keep the TPU busy, than the one in colab. The tutorials I looked at oversimplified the whole matter, the same goes for pytorch-lightning which implies switching to TPU is as easy as changing a single parameter. Furthermore, none of the tutorials I saw (even after specifically searching for that) went into detail about why and how to set up a GCS bucket for data loading.
- How to train large deep learning models as a startup
-
Distributed Training Made Easy with PyTorch-Ignite
XLA on TPUs via pytorch/xla.
-
[P] PyTorch for TensorFlow Users - A Minimal Diff
I don't know of any such trick except for using TensorFlow. In fact, I benchmarked PyTorch XLA vs TensorFlow and found that the former's performance was quite abysmal: PyTorch XLA is very slow on Google Colab. The developers' explanation, as I understood it, was that TF was using features not available to the PyTorch XLA developers and that they therefore could not compete on performance. The situation may be different today, I don't know really.
What are some alternatives?
ignite - High-level library to help with training and evaluating neural networks in PyTorch flexibly and transparently.
NCCL - Optimized primitives for collective multi-GPU communication
gloo - Collective communications library with various primitives for multi-machine training.
pytorch-lightning - Build high-performance AI models with PyTorch Lightning (organized PyTorch). Deploy models with Lightning Apps (organized Python to build end-to-end ML systems). [Moved to: https://github.com/Lightning-AI/lightning]
why-ignite - Why should we use PyTorch-Ignite ?
pocketsphinx - A small speech recognizer
ompi - Open MPI main development repository