SaaSHub helps you find the best software and product alternatives Learn more →
TensorRT Alternatives
Similar projects and alternatives to TensorRT
-
FasterTransformer
Transformer related optimization, including BERT, GPT
-
-
InfluxDB
Access the most powerful time series database as a service. Ingest, store, & analyze all types of time series data in a fully-managed, purpose-built database. Keep data forever with low-cost storage and superior data compression.
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
-
-
tensorrtx
Implementation of popular deep learning networks with TensorRT network definition API
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
jetson-inference
Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
-
-
onnxruntime
ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
-
nebullvm
Plug and play modules to optimize the performances of your AI systems 🚀
-
-
TensorRT
PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT (by pytorch)
-
kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
-
-
-
-
-
-
neural-compressor
Intel® Neural Compressor (formerly known as Intel® Low Precision Optimization Tool), targeting to provide unified APIs for network compression technologies, such as low precision quantization, sparsity, pruning, knowledge distillation, across different deep learning frameworks to pursue optimal inference performance.
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
TensorRT reviews and mentions
-
WIP - TensorRT accelerated stable diffusion img2img from mobile camera over webrtc + whisper speech to text. Interdimensional cable is here! Code: https://github.com/venetanji/videosd
It uses the nvidia demo code from: https://github.com/NVIDIA/TensorRT/tree/main/demo/Diffusion
-
[P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl
The traditional way to deploy a model is to export it to Onnx, then to TensorRT plan format. Each step requires its own tooling, its own mental model, and may raise some issues. The most annoying thing is that you need Microsoft or Nvidia support to get the best performances, and sometimes model support takes time. For instance, T5, a model released in 2019, is not yet correctly supported on TensorRT, in particular K/V cache is missing (soon it will be according to TensorRT maintainers, but I wrote the very same thing almost 1 year ago and then 4 months ago so… I don’t know).
-
Speeding up T5
I've tried to speed it up with TensorRT and followed this example: https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb - it does give considerable speedup for batch-size=1 but it does not work with bigger batch sizes, which is useless as I can simply increase the batch-size of HuggingFace model.
-
An open-source library for optimizing deep learning inference. (1) You select the target optimization, (2) nebullvm searches for the best optimization techniques for your model-hardware configuration, and then (3) serves an optimized model that runs much faster in inference
Open-source projects leveraged by nebullvm include OpenVINO, TensorRT, Intel Neural Compressor, SparseML and DeepSparse, Apache TVM, ONNX Runtime, TFlite and XLA. A huge thank you to the open-source community for developing and maintaining these amazing projects.
-
I was looking for some great quantization open-source libraries that could actually be applied in production (both edge or cloud CPU/GPU). Do you know if I am missing any good libraries?
Nvidia Quantization | Quantization with TensorRT
-
Can you run a quantized model om GPU?
You might want to try Nvidia's quantization toolkit for pytorch: https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization
-
[P] What we learned by making T5-large 2X faster than Pytorch (and any autoregressive transformer)
Nvidia TensorRT demo from Nvidia heavily optimizes computation graph (through aggressive kernel fusions), making T5 inference very fast (they report 10X speedup on small-T5). The trick is that it doesn't use any cache, so it's very fast on short sequence and small models, as it avoids many memory bounded operations by redoing full computation again and again... but as several users have already found (1, 2, 3, 4, ...), this approach doesn't scale when the computation intensity increases, i.e., when base or large models are used instead of a small one, when generation is done on a moderately long sequence of few hundreds of tokens or if beam search is used instead of a greedy search. The graph above show the same behavior with Onnx Runtime;
- I made TensorRT example. I hope this will help beginners. And I also have a question about TensorRT best practice.
-
[P] [D] I made TensorRT example. I hope this will help beginners. And I also have a question about TensorRT best practice.
I used trtexec from Nvidia NGC image. It makes converting TensorRT model very easy and simple.
-
[P] Python library to optimize Hugging Face transformer for inference: < 0.5 ms latency / 2850 infer/sec
On the other side of the spectrum, there is Nvidia demos (here or there) showing us how to build manually a full Transformer graph (operator by operator) in TensorRT to get best performance from their hardware. It’s out of reach for many NLP practitioners and it’s time consuming to debug/maintain/adapt to a slightly different architecture (I tried). Plus, there is a secret: the very optimized model only works for specific sequence lengths and batch sizes. Truth is that, so far (and it will improve soon), it’s mainly for MLPerf benchmark (the one used to compare DL hardware), marketing content, and very specialized engineers.
-
A note from our sponsor - #<SponsorshipServiceOld:0x00007f160c833708>
www.saashub.com | 1 Apr 2023
Stats
NVIDIA/TensorRT is an open source project licensed under Apache License 2.0 which is an OSI approved license.