Large Language Models: Compairing Gen2/Gen3 Models (GPT-3, GPT-J, MT5 and More)

This page summarizes the projects mentioned and recommended in the original post on dev.to

Our great sponsors
  • WorkOS - The modern identity platform for B2B SaaS
  • InfluxDB - Power Real-Time Data Analytics at Scale
  • SaaSHub - Software Alternatives and Reviews
  • mesh-transformer-jax

    Model parallel transformers in JAX and Haiku

  • GPT-J is a LLM case study with two goals: Training a LLM with a data source containing unique material, and using the training frameworkMesh Transformer JAX to achieve a high training efficiency through parallelization. There is no research paper about GPT-J, but on its GitHub pages, the model, different checkpoints, and the complete source code for training is given.

  • math-lm

  • The training material is named The Pile, a 800GB large corpus consisting of 22 different sources, including scientific research papers from ArXiV, legal documents from the the FreeLaw Project, and eBooks from Project Gutenberg campus. As shown in its documentation, GPT-J performance is on par with the GPT-3 6B model. Also, the model can be used for advanced theorem proving and natural language understanding.

  • WorkOS

    The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.

    WorkOS logo
  • Megatron-LM

    Ongoing research training transformer models at scale

  • This 20B model was trained on the same datasets as its predecessor, aptly named The Pile. Furthermore, the libraries Megatron and DeepSpeed were used to achieve better computing resource utilization, and eventually GPT-NeoX evolved into its own framework for training other LLMs. It was used, for example, as the foundation for Llemma, an open-source model specializing on theorem proving.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts