transformer-deploy VS torch2trt

Compare transformer-deploy vs torch2trt and see what are their differences.

transformer-deploy

Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀 (by ELS-RD)
Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
transformer-deploy torch2trt
8 5
1,615 4,388
0.7% 1.7%
6.8 3.1
6 months ago about 1 month ago
Python Python
Apache License 2.0 MIT License
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

transformer-deploy

Posts with mentions or reviews of transformer-deploy. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2022-10-28.

torch2trt

Posts with mentions or reviews of torch2trt. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-10-27.
  • [D] How you deploy your ML model?
    5 projects | /r/MachineLearning | 27 Oct 2021
  • PyTorch 1.10
    8 projects | news.ycombinator.com | 22 Oct 2021
    Main thing you want for server inference is auto batching. It's a feature that's included in onnxruntime, torchserve, nvidia triton inference server and ray serve.

    If you have a lot of preprocessing and post logic in your model it can be hard to export it for onnxruntime or triton so I usually recommend starting with Ray Serve (https://docs.ray.io/en/latest/serve/index.html) and using an actor that runs inference with a quantized model or optimized with tensorrt (https://github.com/NVIDIA-AI-IOT/torch2trt)

  • Jetson Nano: TensorFlow model. Possibly I should use PyTorch instead?
    2 projects | /r/pytorch | 4 Jun 2021
    https://github.com/NVIDIA-AI-IOT/torch2trt <- pretty straightforward https://github.com/jkjung-avt/tensorrt_demos <- this helped me a lot
  • How to get TensorFlow model to run on Jetson Nano?
    1 project | /r/computervision | 4 Jun 2021
    I find Pytorch easier to work with generally. Nvidia has a Pytorch --> TensorRT converter which yields some significant speedups and has a simple Python API. Convert the Pytorch model on the Nano.

What are some alternatives?

When comparing transformer-deploy and torch2trt you can also consider the following projects:

TensorRT - NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

TensorRT - PyTorch/TorchScript/FX compiler for NVIDIA GPUs using TensorRT

FasterTransformer - Transformer related optimization, including BERT, GPT

onnx-simplifier - Simplify your onnx model

Pytorch - Tensors and Dynamic neural networks in Python with strong GPU acceleration

OpenSeeFace - Robust realtime face and facial landmark tracking on CPU with Unity integration

onnxruntime - ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

mmrazor - OpenMMLab Model Compression Toolbox and Benchmark.

tensorrt_demos - TensorRT MODNet, YOLOv4, YOLOv3, SSD, MTCNN, and GoogLeNet

sparsednn - Fast sparse deep learning on CPUs

trt_pose - Real-time pose estimation accelerated with NVIDIA TensorRT