I was looking for some great quantization open-source libraries that could actually be applied in production (both edge or cloud CPU/GPU). Do you know if I am missing any good libraries?

This page summarizes the projects mentioned and recommended in the original post on /r/learnmachinelearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • TensorRT

    NVIDIA® TensorRT™ is an SDK for high-performance deep learning inference on NVIDIA GPUs. This repository contains the open source components of TensorRT.

    Nvidia Quantization | Quantization with TensorRT

  • nebuly

    The user analytics platform for LLMs

    Nebullvm | Easy-to-use library to boost AI inference leveraging state-of-the-art optimization techniques

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

  • aimet

    AIMET is a library that provides advanced quantization and compression techniques for trained neural network models.

    Qualcomm AIMET | Advanced quantization and compression techniques for trained neural network models

  • Model-Compression-Research-Package

    A library for researching neural networks compression and acceleration methods.

    Intel Labs compression | Researching neural networks compression and acceleration methods.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts