TransformerEngine vs PyTorch-Guide

TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference. (by NVIDIA)

Source Code

docs.nvidia.com

Suggest alternative

Edit details

PyTorch-Guide

PyTorch Guide (by mikeroyal)

Pytorch Machine Learning Deep Learning neural-networks Cuda

Source Code

Suggest alternative

Edit details

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

TransformerEngine		PyTorch-Guide
	Project
2	Mentions	2
1,428	Stars	23
13.1%	Growth	-
9.5	Activity	1.8
4 days ago	Latest Commit	over 2 years ago
Python	Language	Python
Apache License 2.0	License	-

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

TransformerEngine

Posts with mentions or reviews of TransformerEngine. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-01-18.

Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)
1 project | /r/nvidia | 30 Apr 2023

4090 now has its 8-bit float enabled as well, see the [transformer engine issue](https://github.com/NVIDIA/TransformerEngine/issues/15)
GPUs for Deep Learning in 2023 – An In-depth Analysis
4 projects | news.ycombinator.com | 18 Jan 2023

Would be curious to see your benchmarks. Btw, Nvidia will be providing support for fp8 in a future release of CUDA - https://github.com/NVIDIA/TransformerEngine/issues/15
I think TMA may not matter as much for consumer cards given the disproportionate amount of fp32 / int32 compute that they have.
Would be interesting to see how close to theoretical folks are able to get once CUDA support comes through.

PyTorch-Guide

Posts with mentions or reviews of PyTorch-Guide. We have used some of these posts to build our list of alternatives and similar projects.

Useful Tools and Programs for Deep Learning with PyTorch
1 project | /r/deeplearning | 30 Mar 2022
Cool PyTorch Guide/Wiki
1 project | /r/pytorch | 16 Mar 2022

PyTorch Guide/Wiki: https://github.com/mikeroyal/PyTorch-Guide

What are some alternatives?

When comparing TransformerEngine and PyTorch-Guide you can also consider the following projects:

Whisper - High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model

halutmatmul - Hashed Lookup Table based Matrix Multiplication (halutmatmul) - Stella Nera accelerator

autocvd - Tool to automatically set CUDA_VISIBLE_DEVICES based on GPU utilization. Usable from command line and code.

NeuralCDE - Code for "Neural Controlled Differential Equations for Irregular Time Series" (Neurips 2020 Spotlight)

warp-drive - Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)

cog - Containers for machine learning

ivy - The Unified AI Framework

bittensor - Internet-scale Neural Networks

nanoGPT - The simplest, fastest repository for training/finetuning medium-sized GPTs.

fastaudio - 🔊 Audio and fastai v2

liberate-fhe - A Fully Homomorphic Encryption (FHE) library for bridging the gap between theory and practice with a focus on performance and accuracy.

FastFold - Optimizing AlphaFold Training and Inference on GPU Clusters