Zero-3 Offload: Scale DL models to trillion parameters without code changes

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • gpt-neox

    An implementation of model parallel autoregressive transformers on GPUs, based on the DeepSpeed library.

  • GPT-NeoX is an example project that is using deepspeed and Zero-3 offloading. The wider project intend to train a GPT-3 sized model and release it freely to the world.

    https://github.com/EleutherAI/gpt-neox

  • fairseq

    Facebook AI Research Sequence-to-Sequence Toolkit written in Python.

  • Support for this was also added to [Fairscale](https://fairscale.readthedocs.io/en/latest/) and [Fairseq](https://github.com/pytorch/fairseq) last week. In particular, the Fairscale implementation can be used in any pyotrch project without requiring the use of the Deepspeed trainer.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • Pytorch

    Tensors and Dynamic neural networks in Python with strong GPU acceleration

  • This is being also added to pytorch

    https://github.com/pytorch/pytorch/pull/46750

  • DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

  • Hi! I’m the one who wrote this code. My ZeRO-3 implementation is currently not working, but I’ve spoken with DeepSpeed devs and they’ve explained to me what I’ve been doing wrong. I haven’t had time to implement the fix but I don’t see any reason to assume it won’t work.

    https://github.com/microsoft/DeepSpeed/issues/846

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • A comprehensive guide to running Llama 2 locally

    19 projects | news.ycombinator.com | 25 Jul 2023
  • Cleared AWS Machine Learning - Specialty exam.. Happy to help!!!

    2 projects | /r/AWSCertifications | 4 Apr 2023
  • People tricking ChatGPT “like watching an Asimov novel come to life”

    1 project | news.ycombinator.com | 2 Dec 2022
  • Good practices for neural network training: identify, save, and document best models

    1 project | dev.to | 4 Jan 2022
  • D I Refuse To Use Pytorch Because Its A Facebook

    1 project | /r/MachineLearning | 29 Dec 2020