On the Variance of the Adaptive Learning Rate and Beyond
Why do you think that https://github.com/davda54/sam is a good alternative to RAdam
On the Variance of the Adaptive Learning Rate and Beyond
Why do you think that https://github.com/davda54/sam is a good alternative to RAdam