SaaSHub helps you find the best software and product alternatives Learn more →
TensorRT Alternatives
Similar projects and alternatives to TensorRT
-
-
SaaSHub
SaaSHub - Software Alternatives and Reviews. SaaSHub helps you find the best software and product alternatives
-
-
-
-
-
-
-
-
-
jan
Jan is an open source alternative to ChatGPT that runs 100% offline on your computer. Multiple engine support (llama.cpp, TensorRT-LLM)
-
-
-
-
DeepSpeed
DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
-
-
kernl
Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.
-
transformer-deploy
Efficient, scalable and enterprise-grade CPU/GPU inference server for 🤗 Hugging Face transformer models 🚀
-
-
-
TensorRT discussion
TensorRT reviews and mentions
-
The 6 Best LLM Tools To Run Models Locally
Extensions: Jan supports extensions like TensortRT and Inference Nitro for customizing and enhancing your AI models.
- AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack
-
Getting SDXL-turbo running with tensorRT
(python demo_txt2img.py "a beautiful photograph of Mt. Fuji during cherry blossom"). https://github.com/NVIDIA/TensorRT/tree/release/8.6/demo/Diffusion
-
Show HN: Ollama for Linux – Run LLMs on Linux with GPU Acceleration
- https://github.com/NVIDIA/TensorRT
TVM and other compiler-based approaches seem to really perform really well and make supporting different backends really easy. A good friend who's been in this space for a while told me llama.cpp is sort of a "hand crafted" version of what these compilers could output, which I think speaks to the craftmanship Georgi and the ggml team have put into llama.cpp, but also the opportunity to "compile" versions of llama.cpp for other model architectures or platforms.
-
Nvidia Introduces TensorRT-LLM for Accelerating LLM Inference on H100/A100 GPUs
https://github.com/NVIDIA/TensorRT/issues/982
Maybe? Looks like tensorRT does work, but I couldn't find much.
-
Train Your AI Model Once and Deploy on Any Cloud
highly optimized transformer-based encoder and decoder component, supported on pytorch, tensorflow and triton
TensorRT, custom ml framework/ inference runtime from nvidia, https://developer.nvidia.com/tensorrt, but you have to port your models
- A1111 just added support for TensorRT for webui as an extension!
-
WIP - TensorRT accelerated stable diffusion img2img from mobile camera over webrtc + whisper speech to text. Interdimensional cable is here! Code: https://github.com/venetanji/videosd
It uses the nvidia demo code from: https://github.com/NVIDIA/TensorRT/tree/main/demo/Diffusion
-
[P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl
The traditional way to deploy a model is to export it to Onnx, then to TensorRT plan format. Each step requires its own tooling, its own mental model, and may raise some issues. The most annoying thing is that you need Microsoft or Nvidia support to get the best performances, and sometimes model support takes time. For instance, T5, a model released in 2019, is not yet correctly supported on TensorRT, in particular K/V cache is missing (soon it will be according to TensorRT maintainers, but I wrote the very same thing almost 1 year ago and then 4 months ago so… I don’t know).
-
Speeding up T5
I've tried to speed it up with TensorRT and followed this example: https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb - it does give considerable speedup for batch-size=1 but it does not work with bigger batch sizes, which is useless as I can simply increase the batch-size of HuggingFace model.
-
A note from our sponsor - SaaSHub
www.saashub.com | 7 Dec 2024
Stats
NVIDIA/TensorRT is an open source project licensed under Apache License 2.0 which is an OSI approved license.
The primary programming language of TensorRT is C++.