[D] How to pick a learning rate scheduler?

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • RAdam

    On the Variance of the Adaptive Learning Rate and Beyond

  • common practice is to include some type of annealing (cosine, linear, etc.), which makes intuitive sense. for adam/adamw, it's generally a good idea to include a warmup in the lr schedule, as the gradient distribution without the warmup can be distorted, leading to the optimizer being trapped in a bad local min. see this paper. there are also introduced in this paper and subsequent works (radam, ranger, and variants) that don't require a warmup stage to stabilize the gradients. i would say in general, if you're using adam/adamw, include a warmup and some annealing, either linear or cosine. if you're using radam/ranger/variants, you can skip the warmup. how many steps to use for warmup/annealing are probably problem specific, and require some hyperparam tuning to get optimimal results

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • [D] Why does a sudden increase in accuracy at a specific epoch in these model

    3 projects | /r/MachineLearning | 19 Dec 2021
  • Why is my loss choppy?

    2 projects | /r/reinforcementlearning | 1 Aug 2021
  • Optimizer obsoletes step-size scheduling, 100% on MNIST's training set 11 epochs

    1 project | news.ycombinator.com | 7 Jan 2021
  • Using both Learning Rate Warmup and Learning Rate Decay Schedule in PyTorch

    2 projects | /r/deeplearning | 17 Apr 2023
  • [N] LR Warmup for PyTorch

    1 project | /r/MachineLearning | 7 Apr 2022