TensorRT vs onnx-tensorrt

TensorRT

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT. (by NVIDIA)

Source Code

developer.nvidia.com

Suggest alternative

Edit details

onnx-tensorrt

ONNX-TensorRT: TensorRT backend for ONNX (by onnx)

Onnx Deep Learning Nvidia

Source Code

Suggest alternative

Edit details

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

TensorRT		onnx-tensorrt
	Project
22	Mentions	4
9,031	Stars	2,745
3.6%	Growth	2.0%
5.0	Activity	4.1
13 days ago	Latest Commit	12 days ago
C++	Language	C++
Apache License 2.0	License	Apache License 2.0

The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.

TensorRT

Posts with mentions or reviews of TensorRT. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2023-09-26.

Show HN: Ollama for Linux – Run LLMs on Linux with GPU Acceleration
14 projects | news.ycombinator.com | 26 Sep 2023

- https://github.com/NVIDIA/TensorRT
TVM and other compiler-based approaches seem to really perform really well and make supporting different backends really easy. A good friend who's been in this space for a while told me llama.cpp is sort of a "hand crafted" version of what these compilers could output, which I think speaks to the craftmanship Georgi and the ggml team have put into llama.cpp, but also the opportunity to "compile" versions of llama.cpp for other model architectures or platforms.
Nvidia Introduces TensorRT-LLM for Accelerating LLM Inference on H100/A100 GPUs
3 projects | news.ycombinator.com | 8 Sep 2023

https://github.com/NVIDIA/TensorRT/issues/982
Maybe? Looks like tensorRT does work, but I couldn't find much.
Train Your AI Model Once and Deploy on Any Cloud
3 projects | news.ycombinator.com | 8 Jul 2023

highly optimized transformer-based encoder and decoder component, supported on pytorch, tensorflow and triton
TensorRT, custom ml framework/ inference runtime from nvidia, https://developer.nvidia.com/tensorrt, but you have to port your models
A1111 just added support for TensorRT for webui as an extension!
5 projects | /r/StableDiffusion | 27 May 2023
WIP - TensorRT accelerated stable diffusion img2img from mobile camera over webrtc + whisper speech to text. Interdimensional cable is here! Code: https://github.com/venetanji/videosd
3 projects | /r/StableDiffusion | 21 Feb 2023

It uses the nvidia demo code from: https://github.com/NVIDIA/TensorRT/tree/main/demo/Diffusion
[P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl
7 projects | /r/MachineLearning | 8 Feb 2023

The traditional way to deploy a model is to export it to Onnx, then to TensorRT plan format. Each step requires its own tooling, its own mental model, and may raise some issues. The most annoying thing is that you need Microsoft or Nvidia support to get the best performances, and sometimes model support takes time. For instance, T5, a model released in 2019, is not yet correctly supported on TensorRT, in particular K/V cache is missing (soon it will be according to TensorRT maintainers, but I wrote the very same thing almost 1 year ago and then 4 months ago so… I don’t know).
Speeding up T5
2 projects | /r/LanguageTechnology | 22 Jan 2023

I've tried to speed it up with TensorRT and followed this example: https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb - it does give considerable speedup for batch-size=1 but it does not work with bigger batch sizes, which is useless as I can simply increase the batch-size of HuggingFace model.
An open-source library for optimizing deep learning inference. (1) You select the target optimization, (2) nebullvm searches for the best optimization techniques for your model-hardware configuration, and then (3) serves an optimized model that runs much faster in inference
10 projects | /r/learnmachinelearning | 26 Jul 2022

Open-source projects leveraged by nebullvm include OpenVINO, TensorRT, Intel Neural Compressor, SparseML and DeepSparse, Apache TVM, ONNX Runtime, TFlite and XLA. A huge thank you to the open-source community for developing and maintaining these amazing projects.
I was looking for some great quantization open-source libraries that could actually be applied in production (both edge or cloud CPU/GPU). Do you know if I am missing any good libraries?
4 projects | /r/learnmachinelearning | 14 Jul 2022

Nvidia Quantization | Quantization with TensorRT
Can you run a quantized model om GPU?
4 projects | /r/deeplearning | 25 Jun 2022

You might want to try Nvidia's quantization toolkit for pytorch: https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization

onnx-tensorrt

Posts with mentions or reviews of onnx-tensorrt. We have used some of these posts to build our list of alternatives and similar projects. The last one was on 2021-06-02.

[P] [D]How to get TensorFlow model to run on Jetson Nano?
4 projects | /r/MachineLearning | 2 Jun 2021

Conversion was done from Keras Tensorflow using to ONNX https://github.com/onnx/keras-onnx followed by ONNX to TensorRT using https://github.com/onnx/onnx-tensorrt The Python code used for inference using TensorRT can be found at https://github.com/jonnor/modeld/blob/tensorrt/tensorrtutils.py

What are some alternatives?

When comparing TensorRT and onnx-tensorrt you can also consider the following projects:

DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

FasterTransformer - Transformer related optimization, including BERT, GPT

onnxruntime - ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

vllm - A high-throughput and memory-efficient inference and serving engine for LLMs

openvino - OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference

stable-diffusion-webui - Stable Diffusion web UI

flash-attention - Fast and memory-efficient exact attention

jetson-inference - Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.

tvm - Open deep learning compiler stack for cpu, gpu and specialized accelerators

tensorrtx - Implementation of popular deep learning networks with TensorRT network definition API

llama.cpp - LLM inference in C/C++

whisper - Robust Speech Recognition via Large-Scale Weak Supervision

TensorRT vs DeepSpeed TensorRT vs FasterTransformer onnx-tensorrt vs onnxruntime TensorRT vs vllm TensorRT vs openvino TensorRT vs stable-diffusion-webui TensorRT vs flash-attention onnx-tensorrt vs jetson-inference TensorRT vs tvm TensorRT vs tensorrtx TensorRT vs llama.cpp TensorRT vs whisper

Compare TensorRT vs onnx-tensorrt and see what are their differences.

TensorRT

onnx-tensorrt

TensorRT

onnx-tensorrt

What are some alternatives?