TensorRT
examples
Our great sponsors
TensorRT | examples | |
---|---|---|
22 | 142 | |
8,891 | 7,699 | |
3.6% | 1.2% | |
6.1 | 6.2 | |
about 2 months ago | 7 days ago | |
C++ | Jupyter Notebook | |
Apache License 2.0 | Apache License 2.0 |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
TensorRT
-
Show HN: Ollama for Linux – Run LLMs on Linux with GPU Acceleration
- https://github.com/NVIDIA/TensorRT
TVM and other compiler-based approaches seem to really perform really well and make supporting different backends really easy. A good friend who's been in this space for a while told me llama.cpp is sort of a "hand crafted" version of what these compilers could output, which I think speaks to the craftmanship Georgi and the ggml team have put into llama.cpp, but also the opportunity to "compile" versions of llama.cpp for other model architectures or platforms.
-
Nvidia Introduces TensorRT-LLM for Accelerating LLM Inference on H100/A100 GPUs
https://github.com/NVIDIA/TensorRT/issues/982
Maybe? Looks like tensorRT does work, but I couldn't find much.
-
Train Your AI Model Once and Deploy on Any Cloud
highly optimized transformer-based encoder and decoder component, supported on pytorch, tensorflow and triton
TensorRT, custom ml framework/ inference runtime from nvidia, https://developer.nvidia.com/tensorrt, but you have to port your models
- A1111 just added support for TensorRT for webui as an extension!
-
WIP - TensorRT accelerated stable diffusion img2img from mobile camera over webrtc + whisper speech to text. Interdimensional cable is here! Code: https://github.com/venetanji/videosd
It uses the nvidia demo code from: https://github.com/NVIDIA/TensorRT/tree/main/demo/Diffusion
-
[P] Get 2x Faster Transcriptions with OpenAI Whisper Large on Kernl
The traditional way to deploy a model is to export it to Onnx, then to TensorRT plan format. Each step requires its own tooling, its own mental model, and may raise some issues. The most annoying thing is that you need Microsoft or Nvidia support to get the best performances, and sometimes model support takes time. For instance, T5, a model released in 2019, is not yet correctly supported on TensorRT, in particular K/V cache is missing (soon it will be according to TensorRT maintainers, but I wrote the very same thing almost 1 year ago and then 4 months ago so… I don’t know).
-
Speeding up T5
I've tried to speed it up with TensorRT and followed this example: https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb - it does give considerable speedup for batch-size=1 but it does not work with bigger batch sizes, which is useless as I can simply increase the batch-size of HuggingFace model.
-
An open-source library for optimizing deep learning inference. (1) You select the target optimization, (2) nebullvm searches for the best optimization techniques for your model-hardware configuration, and then (3) serves an optimized model that runs much faster in inference
Open-source projects leveraged by nebullvm include OpenVINO, TensorRT, Intel Neural Compressor, SparseML and DeepSparse, Apache TVM, ONNX Runtime, TFlite and XLA. A huge thank you to the open-source community for developing and maintaining these amazing projects.
-
I was looking for some great quantization open-source libraries that could actually be applied in production (both edge or cloud CPU/GPU). Do you know if I am missing any good libraries?
Nvidia Quantization | Quantization with TensorRT
-
Can you run a quantized model om GPU?
You might want to try Nvidia's quantization toolkit for pytorch: https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization
examples
-
Open Source Ascendant: The Transformation of Software Development in 2024
AI's Open Embrace Artificial intelligence (AI) and machine learning (ML) are increasingly leveraging open-source frameworks like TensorFlow [https://www.tensorflow.org/] and PyTorch [https://pytorch.org/]. This democratization of AI tools is driving innovation and lowering entry barriers across industries.
-
Best AI Tools for Students Learning Development and Engineering
Which label applies to a tool sometimes depends on what you do with it. For example, PyTorch or TensorFlow can be called a library, a toolkit, or a machine-learning framework.
-
Releasing The Force Of Machine Learning: A Novice’s Guide 😃
TensorFlow: An open-source machine learning framework for high-performance numerical computations, especially well-suited for deep learning.
-
MLOps in practice: building and deploying a machine learning app
The tool used to build the model per se was TensorFlow, a very powerful and end-to-end open source platform for machine learning with a rich ecosystem of tools. And in order to to create the needed script using TensorFlow Jupyter Notebook was used, which is a web-based interactive computing platform.
-
🔥14 Excellent Open-source Projects for Developers😎
10. TensorFlow - Make Machine Learning Work for You 🤖
-
🔥🚀 Top 10 Open-Source Must-Have Tools for Crafting Your Own Chatbot 🤖💬
#2 TensorFlow
- Are there people out there who still like Sam atlman - AI IS AT DANGER
-
How popular are libraries in each technology
Machine learning is the process of using algorithms and statistical models to enable computers to learn from data. There are many tools and libraries available for machine learning, but the most popular by far is TensorFlow. TensorFlow is an open-source platform for machine learning developed by Google. It has over 176k stars on Github and is used by companies such as Airbnb and Intel.
-
React + Tensorflow.js , a cool recipe for AI powered applications
Tensorflow is Google's "end-to-end machine learning platform". It's a framework to manage the whole lifecycle of a Machine Learning (and AI) project, from data preparation to production deployment. Remember the math stuff we talked about in the last section? Tensorflow manages that in addition to a lot of other stuff. Its core API is written for Python and you have to know your math just a little bit in order to play with it. It's more for deep learning models (neural networks) and has a lot of already implemented "layers" for you to use in your network. You can prepare data (images included with the option of image augmentation for small data sets ... yay! 😃), experiment with different model architectures, tune the model's hyperparameters (a fancy name for model configs), train, validate and test your models and monitor your models in production. It's a great framework, but it is not an easy one to learn, especially if you don't like math that much!
-
List of AI-Models
Click to Learn more...
What are some alternatives?
DeepSpeed - DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.
FasterTransformer - Transformer related optimization, including BERT, GPT
onnx-tensorrt - ONNX-TensorRT: TensorRT backend for ONNX
vllm - A high-throughput and memory-efficient inference and serving engine for LLMs
stable-diffusion-webui - Stable Diffusion web UI
openvino - OpenVINO™ is an open-source toolkit for optimizing and deploying AI inference
flash-attention - Fast and memory-efficient exact attention
tvm - Open deep learning compiler stack for cpu, gpu and specialized accelerators
tensorrtx - Implementation of popular deep learning networks with TensorRT network definition API
llama.cpp - LLM inference in C/C++
whisper - Robust Speech Recognition via Large-Scale Weak Supervision
whisper.cpp - Port of OpenAI's Whisper model in C/C++