TransformerEngine vs warp-drive

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference. (by NVIDIA)

Source Code

docs.nvidia.com

Suggest alternative

Edit details

warp-drive

Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022) (by salesforce)

reinforcement-learning GPU Cuda multiagent-reinforcement-learning Deep Learning high-throughput Pytorch numba

Source Code

Suggest alternative

Edit details

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

TransformerEngine		warp-drive
	Project
2	Mentions	1
1,428	Stars	434
13.1%	Growth	1.6%
9.5	Activity	8.1
4 days ago	Latest Commit	10 days ago
Python	Language	Python
Apache License 2.0	License	BSD 3-clause "New" or "Revised" License

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

TransformerEngine

Posts with mentions or reviews of TransformerEngine. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-18.

Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)
1 project | /r/nvidia | 30 Apr 2023

4090 now has its 8-bit float enabled as well, see the [transformer engine issue](https://github.com/NVIDIA/TransformerEngine/issues/15)
GPUs for Deep Learning in 2023 – An In-depth Analysis
4 projects | news.ycombinator.com | 18 Jan 2023

Would be curious to see your benchmarks. Btw, Nvidia will be providing support for fp8 in a future release of CUDA - https://github.com/NVIDIA/TransformerEngine/issues/15
I think TMA may not matter as much for consumer cards given the disproportionate amount of fp32 / int32 compute that they have.
Would be interesting to see how close to theoretical folks are able to get once CUDA support comes through.

warp-drive

Posts with mentions or reviews of warp-drive. We have used some of these posts to build our list of alternatives and similar projects.

[N] Salesforce Open-Sources ‘WarpDrive’, A Light Weight Reinforcement Learning (RL) Framework That Implements End-To-End Multi-Agent RL On A Single GPU
1 project | /r/MachineLearning | 3 Sep 2021

4 Min Read | Codes | Paper | SalesForce Blog

What are some alternatives?

When comparing TransformerEngine and warp-drive you can also consider the following projects:

Whisper - High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

Simple-MADRL-Chess - MADRL project solving chess environment using PPO with two different methods: 2 agents/networks and a single agent/network.

autocvd - Tool to automatically set CUDA_VISIBLE_DEVICES based on GPU utilization. Usable from command line and code.

simba-ps - Fast deterministic all-Python Lennard-Jones particle simulator that utilizes Numba for GPU-accelerated computation.

ivy - The Unified AI Framework

torchrec - Pytorch domain library for recommendation systems

nanoGPT - The simplest, fastest repository for training/finetuning medium-sized GPTs.

chainer - A flexible framework of neural networks for deep learning

fastaudio - 🔊 Audio and fastai v2

jittor - Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.

liberate-fhe - A Fully Homomorphic Encryption (FHE) library for bridging the gap between theory and practice with a focus on performance and accuracy.

cog - Containers for machine learning