I was looking for some great quantization open-source libraries that could actually be applied in production (both edge or cloud CPU/GPU). Do you know if I am missing any good libraries?

Our great sponsors

InfluxDB - Power Real-Time Data Analytics at Scale

WorkOS - The modern identity platform for B2B SaaS

SaaSHub - Software Alternatives and Reviews

Our great sponsors

TensorRT

22 9,031 5.0 C++

NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

Nvidia Quantization | Quantization with TensorRT
nebuly

105 8,368 8.4 Python

The user analytics platform for LLMs

Nebullvm | Easy-to-use library to boost AI inference leveraging state-of-the-art optimization techniques
InfluxDB

www.influxdata.com
sponsored

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
aimet

2 1,889 9.6 Python

AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

Qualcomm AIMET | Advanced quantization and compression techniques for trained neural network models
Model-Compression-Research-Package

1 132 5.3 Python

A library for researching neural networks compression and acceleration methods.

Intel Labs compression | Researching neural networks compression and acceleration methods.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project