[D] How to pick a learning rate scheduler?

InfluxDB - Power Real-Time Data Analytics at Scale

Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

www.influxdata.com

featured

SaaSHub - Software Alternatives and Reviews

SaaSHub helps you find the best software and product alternatives

www.saashub.com

featured

RAdam

4 2,520 0.0 Python

On the Variance of the Adaptive Learning Rate and Beyond

common practice is to include some type of annealing (cosine, linear, etc.), which makes intuitive sense. for adam/adamw, it's generally a good idea to include a warmup in the lr schedule, as the gradient distribution without the warmup can be distorted, leading to the optimizer being trapped in a bad local min. see this paper. there are also introduced in this paper and subsequent works (radam, ranger, and variants) that don't require a warmup stage to stabilize the gradients. i would say in general, if you're using adam/adamw, include a warmup and some annealing, either linear or cosine. if you're using radam/ranger/variants, you can skip the warmup. how many steps to use for warmup/annealing are probably problem specific, and require some hyperparam tuning to get optimimal results

InfluxDB

www.influxdata.com featured

Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

[D] Why does a sudden increase in accuracy at a specific epoch in these model

3 projects | /r/MachineLearning | 19 Dec 2021
Why is my loss choppy?

2 projects | /r/reinforcementlearning | 1 Aug 2021
Optimizer obsoletes step-size scheduling, 100% on MNIST's training set 11 epochs

1 project | news.ycombinator.com | 7 Jan 2021
Using both Learning Rate Warmup and Learning Rate Decay Schedule in PyTorch

2 projects | /r/deeplearning | 17 Apr 2023
[N] LR Warmup for PyTorch

1 project | /r/MachineLearning | 7 Apr 2022

[D] How to pick a learning rate scheduler?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Optimizer adam adam-optimizer warmup
Post date: 4 Aug 2021

RAdam

InfluxDB

Related posts

[D] Why does a sudden increase in accuracy at a specific epoch in these model

Why is my loss choppy?

Optimizer obsoletes step-size scheduling, 100% on MNIST's training set 11 epochs

Using both Learning Rate Warmup and Learning Rate Decay Schedule in PyTorch

[N] LR Warmup for PyTorch

[D] How to pick a learning rate scheduler?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning Optimizer adam adam-optimizer warmup Post date: 4 Aug 2021

RAdam

InfluxDB

Related posts

[D] Why does a sudden increase in accuracy at a specific epoch in these model

Why is my loss choppy?

Optimizer obsoletes step-size scheduling, 100% on MNIST's training set 11 epochs

Using both Learning Rate Warmup and Learning Rate Decay Schedule in PyTorch

[N] LR Warmup for PyTorch

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning
Optimizer adam adam-optimizer warmup
Post date: 4 Aug 2021