higher
higher is a pytorch library allowing users to obtain higher order gradients over losses spanning training loops rather than individual training steps. (by facebookresearch)
SGD-OGR-Hessian-estimator
SGD (stochastic gradient descent) with OGR - online gradient regression Hessian estimator (by JarekDuda)
higher | SGD-OGR-Hessian-estimator | |
---|---|---|
2 | 8 | |
1,561 | 10 | |
- | - | |
0.0 | 4.9 | |
about 2 years ago | about 1 year ago | |
Python | Mathematica | |
Apache License 2.0 | MIT License |
The number of mentions indicates the total number of mentions that we've tracked plus the number of user suggested alternatives.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
Stars - the number of stars that a project has on GitHub. Growth - month over month growth in stars.
Activity is a relative number indicating how actively a project is being developed. Recent commits have higher weight than older ones.
For example, an activity of 9.0 indicates that a project is amongst the top 10% of the most actively developed projects that we are tracking.
higher
Posts with mentions or reviews of higher.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-11-28.
SGD-OGR-Hessian-estimator
Posts with mentions or reviews of SGD-OGR-Hessian-estimator.
We have used some of these posts to build our list of alternatives
and similar projects. The last one was on 2022-11-29.
-
Can you realistically write own neural network training optimizer in Mathematica?
I have developed new approach for optimizer (sources, article: https://github.com/JarekDuda/SGD-OGR-Hessian-estimator ) - estimating Hessian from online linear regression of gradients, in evolving locally interesting subspace.
- [R] SGD augmented with 2nd order information from seen sequence of gradients?
-
Improving gradient descent convergence e.g. based on local trend of gradients?
Regarding comparison with momentum, here is using the largest learning rate (without escaping to infinity) for the same scenario - leading to ~50x worse values after 30 steps: https://github.com/JarekDuda/SGD-OGR-Hessian-estimator/raw/main/momentum.png
-
SGD augmented with 2nd order information from seen sequence of gradients?
Here Hessian is estimated from linear regression of seen gradients, source: https://github.com/JarekDuda/SGD-OGR-Hessian-estimator
- [R] SGD augmented with 2nd order information from seen sequence of gradients - for nasty Beale function starts approaching in ~10 steps
- SGD augmented with 2nd order information from seen sequence of gradients - for nasty Beale function starts approaching in ~10 steps
What are some alternatives?
When comparing higher and SGD-OGR-Hessian-estimator you can also consider the following projects:
backpack - BackPACK - a backpropagation package built on top of PyTorch which efficiently computes quantities other than the gradient.