How to Train Large Models on Many GPUs?

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • composer

    Supercharge Your Model Training (by mosaicml)

  • Mosaic's open source library is excellent: Composer https://github.com/mosaicml/composer.

    * It gives you PyTorch DDP for free. Makes FSDP about as easy as can be, and provides best in class performance monitoring tools. https://docs.mosaicml.com/en/v0.12.1/notes/distributed_train...

    Here's a nice intro to using Huggingface models: https://docs.mosaicml.com/en/v0.12.1/examples/finetune_huggi...

    I'm just a huge fan of their developer experience. It's up there with Transformers and Datasets as the nicest tools to use.

  • alpa

    Training and serving large-scale neural networks with auto parallelization.

  • - Alpa does training and serving with 175B parameter models https://github.com/alpa-projects/alpa

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • datasets

    🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools

  • https://github.com/huggingface/datasets

    https://github.com/huggingface/transformers

  • transformers

    🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

  • https://github.com/huggingface/datasets

    https://github.com/huggingface/transformers

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts