chainer
TransformerEngine
Our great sponsors
chainer | TransformerEngine | |
---|---|---|
2 | 2 | |
5,861 | 1,411 | |
0.3% | 12.0% | |
0.0 | 9.5 | |
8 months ago | 2 days ago | |
Python | Python | |
MIT License | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
chainer
-
ChaiNNer – Node/Graph based image processing and AI upscaling GUI
There is already an AI framework named Chainer: https://github.com/chainer/chainer
-
Protip: the upscaler matters a lot
Sorry maybe someone could chime in and help but I use chainer to upscale. https://github.com/chainer/chainer
TransformerEngine
-
Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)
4090 now has its 8-bit float enabled as well, see the [transformer engine issue](https://github.com/NVIDIA/TransformerEngine/issues/15)
-
GPUs for Deep Learning in 2023 – An In-depth Analysis
Would be curious to see your benchmarks. Btw, Nvidia will be providing support for fp8 in a future release of CUDA - https://github.com/NVIDIA/TransformerEngine/issues/15
I think TMA may not matter as much for consumer cards given the disproportionate amount of fp32 / int32 compute that they have.
Would be interesting to see how close to theoretical folks are able to get once CUDA support comes through.
What are some alternatives?
chaiNNer - A node-based image processing GUI aimed at making chaining image processing tasks easy and customizable. Born as an AI upscaling application, chaiNNer has grown into an extremely flexible and powerful programmatic image processing application.
Whisper - High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
leptonai - A Pythonic framework to simplify AI service building
autocvd - Tool to automatically set CUDA_VISIBLE_DEVICES based on GPU utilization. Usable from command line and code.
tmu - Implements the Tsetlin Machine, Coalesced Tsetlin Machine, Convolutional Tsetlin Machine, Regression Tsetlin Machine, and Weighted Tsetlin Machine, with support for continuous features, drop clause, Type III Feedback, focused negative sampling, multi-task classifier, autoencoder, literal budget, and one-vs-one multi-class classifier. TMU is written in Python with wrappers for C and CUDA-based clause evaluation and updating.
warp-drive - Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
XNOR-popcount-GEMM-PyTorch-CPU-CUDA - A PyTorch implemenation of real XNOR-popcount (1-bit op) GEMM Linear PyTorch extension support both CPU and CUDA
ivy - The Unified AI Framework
SmallPebble - Minimal deep learning library written from scratch in Python, using NumPy/CuPy.
nanoGPT - The simplest, fastest repository for training/finetuning medium-sized GPTs.
fastaudio - 🔊 Audio and fastai v2