[R] AdasOptimizer Update: Cifar-100+MobileNetV2 Adas generalizes with Adas 15% better and 9x faster than Adam

This page summarizes the projects mentioned and recommended in the original post on /r/MachineLearning

InfluxDB - Power Real-Time Data Analytics at Scale
Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
www.influxdata.com
featured
SaaSHub - Software Alternatives and Reviews
SaaSHub helps you find the best software and product alternatives
www.saashub.com
featured
  • AdasOptimizer

    ADAS is short for Adaptive Step Size, it's an optimizer that unlike other optimizers that just normalize the derivative, it fine-tunes the step size, truly making step size scheduling obsolete, achieving state-of-the-art training performance

  • I think too, I was comfortable with posting this because it was the same code for each optimizer. https://github.com/YanaiEliyahu/AdasOptimizer/blob/master/misc/cifar-100-mobilenetv2/model_with_training.py.txt if you care to find what I did wrong, go for it.

  • imagenette

    A smaller subset of 10 easily classified classes from Imagenet, and a little more French

  • You don't need Imagenet to verify it really works or not, Checkout https://github.com/fastai/imagenette the fastai folks have a small subset of Imagenet, which has 3 types of the dataset, test on them. If AdasOptimizer really works it you should be able to beat their results, or at least see where it stands.

  • InfluxDB

    Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.

    InfluxDB logo
  • DemonRangerOptimizer

    Quasi Hyperbolic Rectified DEMON Adam/Amsgrad with AdaMod, Gradient Centralization, Lookahead, iterative averaging and decorrelated Weight Decay

  • The results are interesting, but in terms of novelty of the main theory - isn't it almost identical to Baydin et al.? https://arxiv.org/pdf/1703.04782.pdf It seems the difference may be in some implementation details, like using a running average for the past gradient. If it's useful, I implemented a bunch of optimizers with options to synergize different techniques (https://github.com/JRC1995/DemonRangerOptimizer) including hypergradient updates for stuffs (and taking into account decorrelated weight decay and per-parameter lrs for hypergradient lr) when I was bored before practically abandoning it all together. I didn't really run any experiments with it though, but some people tried although they may not have got any particularly striking results.

  • pytorch-optimizer

    torch-optimizer -- collection of optimizers for Pytorch

NOTE: The number of mentions on this list indicates mentions on common posts plus user suggested alternatives. Hence, a higher number means a more popular project.

Suggest a related project

Related posts

  • [D]: Implementation: Deconvolutional Paragraph Representation Learning

    1 project | /r/MachineLearning | 25 Apr 2023
  • [D] Why does a sudden increase in accuracy at a specific epoch in these model

    3 projects | /r/MachineLearning | 19 Dec 2021
  • VQGAN+CLIP : "RAdam" from torch_optimizer could not be imported ?

    2 projects | /r/deepdream | 28 Oct 2021
  • [D] How to pick a learning rate scheduler?

    1 project | /r/MachineLearning | 4 Aug 2021
  • Why is my loss choppy?

    2 projects | /r/reinforcementlearning | 1 Aug 2021