Large Language Models: Compairing Gen2/Gen3 Models (GPT-3, GPT-J, MT5 and More)

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

mesh-transformer-jax

52 6,213 0.0 Python

Model parallel transformers in JAX and Haiku

GPT-J is a LLM case study with two goals: Training a LLM with a data source containing unique material, and using the training frameworkMesh Transformer JAX to achieve a high training efficiency through parallelization. There is no research paper about GPT-J, but on its GitHub pages, the model, different checkpoints, and the complete source code for training is given.

math-lm

2 971 8.4 Python

The training material is named The Pile, a 800GB large corpus consisting of 22 different sources, including scientific research papers from ArXiV, legal documents from the the FreeLaw Project, and eBooks from Project Gutenberg campus. As shown in its documentation, GPT-J performance is on par with the GPT-3 6B model. Also, the model can be used for advanced theorem proving and natural language understanding.

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
Megatron-LM

18 8,561 9.9 Python

Ongoing research training transformer models at scale

This 20B model was trained on the same datasets as its predecessor, aptly named The Pile. Furthermore, the libraries Megatron and DeepSpeed were used to achieve better computing resource utilization, and eventually GPT-NeoX evolved into its own framework for training other LLMs. It was used, for example, as the foundation for Llemma, an open-source model specializing on theorem proving.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Show HN: Free GitHub Copilot CLI with your own model or API
1 project | news.ycombinator.com | 27 Apr 2024
Einsum in 40 Lines of Python
5 projects | news.ycombinator.com | 27 Apr 2024
Show HN: Cognita – open-source RAG framework for modular applications
2 projects | news.ycombinator.com | 27 Apr 2024
Show HN: Data Bonsai: a Python package to clean your data with LLMs
1 project | news.ycombinator.com | 27 Apr 2024
Ask HN: Seeking On-Premises Website Examples for Uptime Comparison Experiment
2 projects | news.ycombinator.com | 27 Apr 2024

Large Language Models: Compairing Gen2/Gen3 Models (GPT-3, GPT-J, MT5 and More)

This page summarizes the projects mentioned and recommended in the original post on dev.to Post date: 12 Feb 2024

mesh-transformer-jax

math-lm

WorkOS

Megatron-LM

Related posts