[P] BetterTransformer: PyTorch-native free-lunch speedups for Transformer-based models

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • kernl

    Kernl lets you run PyTorch transformer models several times faster on GPU with a single line of code, and is designed to be easily hackable.

  • FlashAttention + quantization has to the best of knowledge not yet been explored, but I think it would a great engineering direction. I would not expect to see this any time soon natively in PyTorch's BetterTransformer though. /u/pommedeterresautee & folks at ELS-RD made an awesome work releasing kernl where custom implementations (through OpenAI Triton) could maybe easily live.

  • onnxruntime

    ONNX Runtime: cross-platform, high performance ML inferencing and training accelerator

  • Are you doing dynamic or static quantization? Static quantization can be tricky, usually dynamic quantization is more straightforward. Also, if you deal with encoder-decoder models, it could be that quantization error accumulates in the decoder. For the slowdowns you are seeing... there could be many reasons. The first thing you should check is whether running through ONNX Runtime / OpenVino is at least on par (if not better) than PyTorch eager. If not, there may be an issue at a higher level (e.g. here). If yes, it could be your CPU does not support AVX VNNI instructions for example. Also depending on batch size, sequence length, the speedups from quantization may greatly vary.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • optimum

    🚀 Accelerate training and inference of 🤗 Transformers and 🤗 Diffusers with easy to use hardware optimization tools

  • Yes Optimum lib's documentation is unfortunately not yet in best shape. I would be really thankful if you fill an issue detailing where the doc can be improved: https://github.com/huggingface/optimum/issues . Also, if you have features request, such as having a more flexible API, we are eager for community contributions or suggestions!

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts