-
AdasOptimizer
ADAS is short for Adaptive Step Size, it's an optimizer that unlike other optimizers that just normalize the derivative, it fine-tunes the step size, truly making step size scheduling obsolete, achieving state-of-the-art training performance
-
InfluxDB
Power Real-Time Data Analytics at Scale. Get real-time insights from all types of time series data with InfluxDB. Ingest, query, and analyze billions of data points in real-time with unbounded cardinality.
-
DemonRangerOptimizer
Quasi Hyperbolic Rectified DEMON Adam/Amsgrad with AdaMod, Gradient Centralization, Lookahead, iterative averaging and decorrelated Weight Decay
I think too, I was comfortable with posting this because it was the same code for each optimizer. https://github.com/YanaiEliyahu/AdasOptimizer/blob/master/misc/cifar-100-mobilenetv2/model_with_training.py.txt if you care to find what I did wrong, go for it.
You don't need Imagenet to verify it really works or not, Checkout https://github.com/fastai/imagenette the fastai folks have a small subset of Imagenet, which has 3 types of the dataset, test on them. If AdasOptimizer really works it you should be able to beat their results, or at least see where it stands.
The results are interesting, but in terms of novelty of the main theory - isn't it almost identical to Baydin et al.? https://arxiv.org/pdf/1703.04782.pdf It seems the difference may be in some implementation details, like using a running average for the past gradient. If it's useful, I implemented a bunch of optimizers with options to synergize different techniques (https://github.com/JRC1995/DemonRangerOptimizer) including hypergradient updates for stuffs (and taking into account decorrelated weight decay and per-parameter lrs for hypergradient lr) when I was bored before practically abandoning it all together. I didn't really run any experiments with it though, but some people tried although they may not have got any particularly striking results.