Our great sponsors
-
TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I've tried https://github.com/Ki6an/fastT5 but it works with CPU only.
I've tried to speed it up with TensorRT and followed this example: https://github.com/NVIDIA/TensorRT/blob/main/demo/HuggingFace/notebooks/t5.ipynb - it does give considerable speedup for batch-size=1 but it does not work with bigger batch sizes, which is useless as I can simply increase the batch-size of HuggingFace model.
Related posts
- A1111 just added support for TensorRT for webui as an extension!
- AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack
- Getting SDXL-turbo running with tensorRT
- Show HN: Ollama for Linux – Run LLMs on Linux with GPU Acceleration
- Train Your AI Model Once and Deploy on Any Cloud