Our great sponsors
-
TensorRT
NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
https://docs.nvidia.com/ai-enterprise/overview/0.1.0/platfor...
RIVA: NVIDIA® Riva, a premium edition of NVIDIA AI Enterprise software, is a GPU-accelerated speech and translation AI SDK
FasterTransformer: https://github.com/NVIDIA/FasterTransformer an
highly optimized transformer-based encoder and decoder component, supported on pytorch, tensorflow and triton
TensorRT, custom ml framework/ inference runtime from nvidia, https://developer.nvidia.com/tensorrt, but you have to port your models
Related posts
- Can you run a quantized model om GPU?
- [P] Python library to optimize Hugging Face transformer for inference: < 0.5 ms latency / 2850 infer/sec
- AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack
- Getting SDXL-turbo running with tensorRT
- Show HN: Ollama for Linux – Run LLMs on Linux with GPU Acceleration