Optimization of CUDA Elementwise Template Library: Practical, Efficient, and Extensible

This page summarizes the projects mentioned and recommended in the original post on /r/CUDA

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • oneflow

    OneFlow is a deep learning framework designed to be user-friendly, scalable and efficient.

  • Elementwise operation refers to applying a function transformation to every element of a tensor. In deep learning, many operators can be regraded as elementwise operators, such as common activation functions (like ReLU and GELU) and ScalarMultiply (multiplying each element of a tensor by a scalar). For this elementwise operation, OneFlow(https://github.com/Oneflow-Inc/oneflow/) abstracts a CUDA template. this article will introduce the design thoughts and optimization techniques of CUDA template.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts