TransformerEngine
ivy
Our great sponsors
TransformerEngine | ivy | |
---|---|---|
2 | 17 | |
1,428 | 14,021 | |
13.1% | 0.5% | |
9.5 | 10.0 | |
4 days ago | about 15 hours ago | |
Python | Python | |
Apache License 2.0 | GNU General Public License v3.0 or later |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
TransformerEngine
-
Benchmarking Large Language Models on NVIDIA H100 GPUs with CoreWeave (Part 1)
4090 now has its 8-bit float enabled as well, see the [transformer engine issue](https://github.com/NVIDIA/TransformerEngine/issues/15)
-
GPUs for Deep Learning in 2023 β An In-depth Analysis
Would be curious to see your benchmarks. Btw, Nvidia will be providing support for fp8 in a future release of CUDA - https://github.com/NVIDIA/TransformerEngine/issues/15
I think TMA may not matter as much for consumer cards given the disproportionate amount of fp32 / int32 compute that they have.
Would be interesting to see how close to theoretical folks are able to get once CUDA support comes through.
ivy
-
Keras 3.0
See also https://github.com/unifyai/ivy which I have not tried but seems along the lines of what you are describing, working with all the major frameworks
-
Show HN: Carton β Run any ML model from any programming language
is this ancillary to what [these guys](https://github.com/unifyai/ivy) are trying to do?
- Ivy: All in one machine learning framework
- Ivy ML Transpiler and Framework
-
[D] Keras 3.0 Announcement: Keras for TensorFlow, JAX, and PyTorch
https://unify.ai/ They are trying to do what Ivy is doing already.
-
Ask for help: what is the best way to have code both support torch and numpy?
Check Ivy.
-
CoreML Stable Diffusion
ROCm's great for data centers, but good luck finding anything about desktop GPUs on their site apart from this lone blog post: https://community.amd.com/t5/instinct-accelerators/exploring...
There's a good explanation of AMD's ROCm targets here: https://news.ycombinator.com/item?id=28200477
It's currently a PITA to get common Python libs like Numba to even talk to AMD cards (admittedly Numba won't talk to older Nvidia cards either and they deprecate ruthlessly; I had to downgrade 8 versions to get it working with a 5yo mobile workstation). YC-backed Ivy claims to be working on unifying ML frameworks in a hardware-agnostic way but I don't have enough experience to assess how well they're succeeding yet: https://lets-unify.ai
I was happy to see DiffusionBee does talk the GPU in my late-model intel Mac, though for some reason it only uses 50% of its power right now. I'm sure the situation will improve as Metal 3.0 and Vulkan get more established.
-
DL Frameworks in a nutshell
Won't it all come together with https://lets-unify.ai/ ?
- Unified Machine Learning
-
[Discussion] Opinions on unify AI
What do you think about unify AI https://lets-unify.ai.
What are some alternatives?
Whisper - High-performance GPGPU inference of OpenAI's Whisper automatic speech recognition (ASR) model
PaddleNLP - π Easy-to-use and powerful NLP and LLM library with π€ Awesome model zoo, supporting wide-range of NLP tasks from research to industrial applications, including πText Classification, π Neural Search, β Question Answering, βΉοΈ Information Extraction, π Document Intelligence, π Sentiment Analysis etc.
autocvd - Tool to automatically set CUDA_VISIBLE_DEVICES based on GPU utilization. Usable from command line and code.
ColossalAI - Making large AI models cheaper, faster and more accessible
warp-drive - Extremely Fast End-to-End Deep Multi-Agent Reinforcement Learning Framework on a GPU (JMLR 2022)
DeepFaceLive - Real-time face swap for PC streaming or video calls
nanoGPT - The simplest, fastest repository for training/finetuning medium-sized GPTs.
PaddleOCR - Awesome multilingual OCR toolkits based on PaddlePaddle (practical ultra lightweight OCR system, support 80+ languages recognition, provide data annotation and synthesis tools, support training and deployment among server, mobile, embedded and IoT devices)
fastaudio - π Audio and fastai v2
lisp - Toy Lisp 1.5 interpreter
liberate-fhe - A Fully Homomorphic Encryption (FHE) library for bridging the gap between theory and practice with a focus on performance and accuracy.
Kornia - Geometric Computer Vision Library for Spatial AI