Can you run a quantized model om GPU?

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

XNOR-popcount-GEMM-PyTorch-CPU-CUDA

1 14 2.5 Python

A PyTorch implemenation of real XNOR-popcount (1-bit op) GEMM Linear PyTorch extension support both CPU and CUDA
Binary-Convolutional-Neural-Network-Inference-on-GPU

1 21 10.0 C++

GPU implementation of Xnor network on inference level.
InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
SBNN

1 11 10.0 Cuda

Singular Binarized Neural Network based on GPU Bit Operations (see our SC-19 paper)
TensorRT

22 9,145 5.0 C++

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

You might want to try Nvidia's quantization toolkit for pytorch: https://github.com/NVIDIA/TensorRT/tree/main/tools/pytorch-quantization

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Train Your AI Model Once and Deploy on Any Cloud

3 projects | news.ycombinator.com | 8 Jul 2023
Intel Arc A770: Arrays larger than 4GB crashes

2 projects | news.ycombinator.com | 7 May 2024
Efficient LLM inference solution on Intel GPU

3 projects | news.ycombinator.com | 20 Jan 2024
AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack

1 project | news.ycombinator.com | 17 Dec 2023
Tch-Rs

1 project | news.ycombinator.com | 7 Dec 2023

Can you run a quantized model om GPU?

This page summarizes the projects mentioned and recommended in the original post on /r/deeplearning
Deep Learning Tensorrt Pytorch Nvidia neural-network
Post date: 25 Jun 2022

XNOR-popcount-GEMM-PyTorch-CPU-CUDA

Binary-Convolutional-Neural-Network-Inference-on-GPU

InfluxDB

SBNN

TensorRT

Related posts

Train Your AI Model Once and Deploy on Any Cloud

Intel Arc A770: Arrays larger than 4GB crashes

Efficient LLM inference solution on Intel GPU

AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack

Tch-Rs

Can you run a quantized model om GPU?

This page summarizes the projects mentioned and recommended in the original post on /r/deeplearning Deep Learning Tensorrt Pytorch Nvidia neural-network Post date: 25 Jun 2022

XNOR-popcount-GEMM-PyTorch-CPU-CUDA

Binary-Convolutional-Neural-Network-Inference-on-GPU

InfluxDB

SBNN

TensorRT

Related posts

Train Your AI Model Once and Deploy on Any Cloud

Intel Arc A770: Arrays larger than 4GB crashes

Efficient LLM inference solution on Intel GPU

AMD MI300X 30% higher performance than Nvidia H100, even with optimized stack

Tch-Rs

This page summarizes the projects mentioned and recommended in the original post on /r/deeplearning
Deep Learning Tensorrt Pytorch Nvidia neural-network
Post date: 25 Jun 2022