[R] Data Movement Is All You Need: A Case Study on Optimizing Transformers

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • DeepLearningExamples

    State-of-the-Art Deep Learning scripts organized by models - easy to train and deploy with reproducible accuracy and performance on enterprise-grade infrastructure.

    The Nvidia's implementation of BERT has a long way to go (I don't know about the implementations of input independent gradient computations in their backprop). But, there are scaled benchmarks on DGX A100's -https://github.com/NVIDIA/DeepLearningExamples/tree/master/TensorFlow/LanguageModeling/BERT

  • Megatron-LM

    Ongoing research training transformer models at scale

    Nvidia's own implementation of Transformers, i.e, Megatron on NVIDIA's Selene supercomputer (where GPT-3 is possible too) -https://github.com/NVIDIA/Megatron-LM

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts