Our great sponsors
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
I believe tuning hyper parameters well without a lot of waste for the largest models was only figured out by Greg Yang/Microsoft Research around 2022 (cited in GPT-4 paper):
https://arxiv.org/abs/2203.03466
Also part of how they predicted the loss ahead of time so well.
NOTE:
The number of mentions on this list indicates mentions on common posts plus user suggested alternatives.
Hence, a higher number means a more popular project.
Related posts
- Cerebras Open Sources Seven GPT models and Introduces New Scaling Law
- OpenAI’s policies hinder reproducible research on language models
- [R] Greg Yang's work on a rigorous mathematical theory for neural networks
- DeepMind’s New Language Model,Chinchilla(70B Parameters),Which Outperforms GPT-3
- "Training Compute-Optimal Large Language Models", Hoffmann et al 2022 {DeepMind} (current LLMs are significantly undertrained)