DeepMind’s New Language Model,Chinchilla(70B Parameters),Which Outperforms GPT-3

Our great sponsors

WorkOS - The modern identity platform for B2B SaaS

InfluxDB - Power Real-Time Data Analytics at Scale

SaaSHub - Software Alternatives and Reviews

Our great sponsors

mup

12 1,165 3.8 Jupyter Notebook

maximal update parametrization (µP)

I think there remains an immense amount of such suboptimality still hanging from the tree, so to speak.
For example, our recent paper "Tensor Programs V: Tuning Large Neural Networks via Zero-Shot Hyperparameter Transfer"[1] shows that even learning rate and initialization used by existing models are deeply wrong. By just picking them correctly (which involves some really beautiful mathematics), we can effectively double the model size of the GPT-3 6.7B model (to be comparable in quality to the 13B model across the suite of benchmark tasks).
Large neural networks behave in a way we are only beginning to understand well just because each empirical probe of any such model is so much more expensive and time consuming than typical models. But principled theory here can have a lot of leverage by pointing out the right direction to look, as it did in our work.
[1] http://arxiv.org/abs/2203.03466

gpt-3

39 9,406 3.5

Discontinued GPT-3: Language Models are Few-Shot Learners

It implies our models are wrong.
Consider that a human adolescence is ~9.46x10^6 minutes and a fast speaking rate is ~200words/minute. That sets an upper bound of 1.9 billion words heard during adolescence. ie: human adults are trained on a corpus of less than 1.9B words.
To some extent, more data can offset worse models, but I don't think that's the regieme we're currently in. GPT-3 was trained (on among other languages) 181 billion English words - or about 100 times more words than a human will hear by the time they reach adulthood. How is the human brain able to achieve a higher level of success with 1% of the data?
1. https://github.com/openai/gpt-3/blob/master/dataset_statisti...

WorkOS

workos.com sponsored

The modern identity platform for B2B SaaS. The APIs are flexible and easy-to-use, supporting authentication, user identity, and complex enterprise features like SSO and SCIM provisioning.
cdx-index-client

1 171 10.0 Python

A command-line tool for using CommonCrawl Index API at http://index.commoncrawl.org/

Common Crawl actually does not contain Twitter, you can go check the indexes https://github.com/ikreymer/cdx-index-client . Twitter is extremely aggressive about scraping/caching, and I guess that blocks CC. Models like GPT-3 still know a decent amount of Twitter material, and I figure that this is due to tweets being excerpts or mirrored manually in non-Twitter.com URLs (eg all the Twitter-mirroring bots on Reddit).

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Bard is getting better at logic and reasoning
1 project | news.ycombinator.com | 7 Jun 2023
Cerebras Open Sources Seven GPT models and Introduces New Scaling Law
3 projects | /r/mlscaling | 28 Mar 2023
OpenAI’s policies hinder reproducible research on language models
2 projects | news.ycombinator.com | 23 Mar 2023
[R] Greg Yang's work on a rigorous mathematical theory for neural networks
4 projects | /r/MachineLearning | 7 Jan 2023
"Training Compute-Optimal Large Language Models", Hoffmann et al 2022 {DeepMind} (current LLMs are significantly undertrained)
1 project | /r/mlscaling | 31 Mar 2022

DeepMind’s New Language Model,Chinchilla(70B Parameters),Which Outperforms GPT-3

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Python Pytorch Transformers Machine Learning Deep Learning
Post date: 11 Apr 2022

mup

gpt-3

WorkOS

cdx-index-client

Related posts

DeepMind’s New Language Model,Chinchilla(70B Parameters),Which Outperforms GPT-3

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com Python Pytorch Transformers Machine Learning Deep Learning Post date: 11 Apr 2022

mup

gpt-3

WorkOS

cdx-index-client

Related posts

This page summarizes the projects mentioned and recommended in the original post on news.ycombinator.com
Python Pytorch Transformers Machine Learning Deep Learning
Post date: 11 Apr 2022