RAdam
pytorch_warmup
RAdam | pytorch_warmup | |
---|---|---|
4 | 3 | |
2,520 | 359 | |
- | - | |
0.0 | 3.4 | |
almost 3 years ago | 7 days ago | |
Python | Python | |
Apache License 2.0 | MIT License |
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
RAdam
-
[D] Why does a sudden increase in accuracy at a specific epoch in these model
Code for https://arxiv.org/abs/1908.03265 found: https://github.com/LiyuanLucasLiu/RAdam
-
[D] How to pick a learning rate scheduler?
common practice is to include some type of annealing (cosine, linear, etc.), which makes intuitive sense. for adam/adamw, it's generally a good idea to include a warmup in the lr schedule, as the gradient distribution without the warmup can be distorted, leading to the optimizer being trapped in a bad local min. see this paper. there are also introduced in this paper and subsequent works (radam, ranger, and variants) that don't require a warmup stage to stabilize the gradients. i would say in general, if you're using adam/adamw, include a warmup and some annealing, either linear or cosine. if you're using radam/ranger/variants, you can skip the warmup. how many steps to use for warmup/annealing are probably problem specific, and require some hyperparam tuning to get optimimal results
- Why is my loss choppy?
pytorch_warmup
-
Using both Learning Rate Warmup and Learning Rate Decay Schedule in PyTorch
I’m currently using this for learning rate warmup, specifically the LinearWarmup(). So this simply ramps up from 0 to max_lr over a given number of steps.
-
[N] LR Warmup for PyTorch
pytorch_warmup v0.1.0 was released.
- LR Warmup for PyTorch
What are some alternatives?
ML-Optimizers-JAX - Toy implementations of some popular ML optimizers using Python/JAX
pytorch-lightning - Build high-performance AI models with PyTorch Lightning (organized PyTorch). Deploy models with Lightning Apps (organized Python to build end-to-end ML systems). [Moved to: https://github.com/Lightning-AI/lightning]
AdaBound - An optimizer that trains as fast as Adam and as good as SGD.
pytorch-lightning - Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
pytorch-optimizer - torch-optimizer -- collection of optimizers for Pytorch
MockingBird - 🚀AI拟声: 5秒内克隆您的声音并生成任意语音内容 Clone a voice in 5 seconds to generate arbitrary speech in real-time
DemonRangerOptimizer - Quasi Hyperbolic Rectified DEMON Adam/Amsgrad with AdaMod, Gradient Centralization, Lookahead, iterative averaging and decorrelated Weight Decay
transformers - 🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
Best-Deep-Learning-Optimizers - Collection of the latest, greatest, deep learning optimizers (for Pytorch) - CNN, NLP suitable
yolov5 - YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
deepnet - Educational deep learning library in plain Numpy.
iamusica_training - ONSETS&VELOCITIES real-time piano detection - PyTorch training [EUSIPCO2023]