TransformerEngine
warp-drive
Our great sponsors
TransformerEngine | warp-drive | |
---|---|---|
2 | 1 | |
1,428 | 434 | |
13.1% | 1.6% | |
9.5 | 8.1 | |
4 days ago | 10 days ago | |
Python | Python | |
Apache License 2.0 | BSD 3-clause "New" or "Revised" License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
TransformerEngine
-
Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)
4090 now has its 8-bit float enabled as well, see the [transformer engine issue](https://github.com/NVIDIA/TransformerEngine/issues/15)
-
GPUs for Deep Learning in 2023 – An In-depth Analysis
Would be curious to see your benchmarks. Btw, Nvidia will be providing support for fp8 in a future release of CUDA - https://github.com/NVIDIA/TransformerEngine/issues/15
I think TMA may not matter as much for consumer cards given the disproportionate amount of fp32 / int32 compute that they have.
Would be interesting to see how close to theoretical folks are able to get once CUDA support comes through.
warp-drive
-
[N] Salesforce Open-Sources ‘WarpDrive’, A Light Weight Reinforcement Learning (RL) Framework That Implements End-To-End Multi-Agent RL On A Single GPU
4 Min Read | Codes | Paper | SalesForce Blog
What are some alternatives?
Whisper - High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
Simple-MADRL-Chess - MADRL project solving chess environment using PPO with two different methods: 2 agents/networks and a single agent/network.
autocvd - Tool to automatically set CUDA_VISIBLE_DEVICES based on GPU utilization. Usable from command line and code.
simba-ps - Fast deterministic all-Python Lennard-Jones particle simulator that utilizes Numba for GPU-accelerated computation.
ivy - The Unified AI Framework
torchrec - Pytorch domain library for recommendation systems
nanoGPT - The simplest, fastest repository for training/finetuning medium-sized GPTs.
chainer - A flexible framework of neural networks for deep learning
fastaudio - 🔊 Audio and fastai v2
jittor - Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
liberate-fhe - A Fully Homomorphic Encryption (FHE) library for bridging the gap between theory and practice with a focus on performance and accuracy.
cog - Containers for machine learning