Want to understand INT8 better

This page summarizes the projects mentioned and recommended in the original post on /r/CUDA

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • cutlass

    CUDA Templates for Linear Algebra Subroutines

  • The latter (and I guess you were asking about this one) is designed to accelerate NN inference in reduced precision. It is possible to use Tensor Cores for you own purposes, mainly through CUTLASS. But because Tensor Cores are designed to execute matrix multiplications, it can be hard to adapt your problem to them. The performance with them is insane (IIRC 32x the performance of the INT32 pipeline), but only for matrix multiplication…

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts