TensorRT
torch2trt
Our great sponsors
TensorRT | torch2trt | |
---|---|---|
5 | 5 | |
2,328 | 4,388 | |
3.2% | 1.7% | |
9.6 | 3.1 | |
6 days ago | about 1 month ago | |
Python | Python | |
BSD 3-clause "New" or "Revised" License | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
TensorRT
- Learn TensorRT optimization
- I made TensorRT example. I hope this will help beginners. And I also have a question about TensorRT best practice.
- [P] [D] I made TensorRT example. I hope this will help beginners. And I also have a question about TensorRT best practice.
-
[P] 4.5 times faster Hugging Face transformer inference by modifying some Python AST
Have you tried the new Torch-TensorRT compiler from NVIDIA?
-
PyTorch 1.10
You can quantize your model to FP16 or Int8 using PTQ as well and it should give you an additional speed up inference wise.
Here is a tutorial[2] to leverage TRTorch.
[1] https://github.com/NVIDIA/TRTorch/tree/master/core
torch2trt
- [D] How you deploy your ML model?
-
PyTorch 1.10
Main thing you want for server inference is auto batching. It's a feature that's included in onnxruntime, torchserve, nvidia triton inference server and ray serve.
If you have a lot of preprocessing and post logic in your model it can be hard to export it for onnxruntime or triton so I usually recommend starting with Ray Serve (https://docs.ray.io/en/latest/serve/index.html) and using an actor that runs inference with a quantized model or optimized with tensorrt (https://github.com/NVIDIA-AI-IOT/torch2trt)
-
Jetson Nano: TensorFlow model. Possibly I should use PyTorch instead?
https://github.com/NVIDIA-AI-IOT/torch2trt <- pretty straightforward https://github.com/jkjung-avt/tensorrt_demos <- this helped me a lot
-
How to get TensorFlow model to run on Jetson Nano?
I find Pytorch easier to work with generally. Nvidia has a Pytorch --> TensorRT converter which yields some significant speedups and has a simple Python API. Convert the Pytorch model on the Nano.
What are some alternatives?
onnxruntime - ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator
onnx-simplifier - Simplify your onnx model
cutlass - CUDA Templates for Linear Algebra Subroutines
Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration
transformer-deploy - Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
TensorRT - NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
tensorrt_demos - TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet
jetson - Self-driving AI toy car 🤖🚗.
trt_pose - Real-time pose estimation accelerated with NVIDIA TensorRT