[D] Fine-tuning GPT-J: lessons learned

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

Our great sponsors
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • WorkOS - The modern identity platform for B2B SaaS
  • SaaSHub - Software Alternatives and Reviews
  • Finetune_LLMs

    Repo for fine-tuning Casual LLMs

  • And this: https://github.com/mallorbc/Finetune_GPTNEO_GPTJ6B

  • DeepSpeed

    DeepSpeed is a deep learning optimization library that makes distributed training and inference easy, efficient, and effective.

  • Regarding inference, it seems that Deepspeed doesn't support inference for GPT-J, but they are planning to work on it: https://github.com/microsoft/DeepSpeed/issues/1332

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • mesh-transformer-jax

    Model parallel transformers in JAX and Haiku

  • I don't know I was surprised too. The config recommended for TPU in the how-to we used is hard to compare with the one we used for fine-tuning on GPU with Deepspeed. So maybe both are not exactly equal and we're comparing apples to oranges...

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts