Spark in me - Internet, data science, math, deep learning, philo

snakers4 @ telegram, 1797 members, 1726 posts since 2016

All this - lost like tears in rain.

Data science, ML, a bit of philosophy and math. No bs.

Our website
- spark-in.me
Our chat
- t.me/joinchat/Bv9tjkH9JHYvOr92hi5LxQ
DS courses review
- goo.gl/5VGU5A
- goo.gl/YzVUKf

February 27, 07:50

New variation of Adam?

- [Website](www.luolc.com/publications/adabound/);

- [Code](github.com/Luolc/AdaBound);

- Eliminate the generalization gap between adaptive methods and SGD;

- TL;DR: A Faster And Better Optimizer with Highly Robust Performance;

- Dynamic bound on learning rates. Inspired by gradient clipping;

- Not very sensitive to the hyperparameters, especially compared with Sgd(M);

- Tested on MNIST, CIFAR, Penn Treebank - no serious datasets;

#deep_learning

Adaptive Gradient Methods with Dynamic Bound of Learning Rate

Abstract Adaptive optimization methods such as AdaGrad, RMSProp and Adam have been proposed to achieve a rapid training process with an element-wise scaling term on learning rates. Though prevailing, they are observed to generalize poorly compared with Sgd or even fail to converge due to unstable and extreme learning rates. Recent work has put forward some algorithms such as AMSGrad to tackle this issue but they failed to achieve considerable improvement over existing methods.