Name		Name	Last commit message	Last commit date
parent directory ..
code		code
doc		doc
tex		tex
README.md		README.md
README.tex.md		README.tex.md

README.md

Adaptive Moment Estimation (Adam)

Adaptive Moment Estimation better known as Adam is another adaptive learning rate method first published in 2014 by Kingma et. al. [1] In addition to storing an exponentially decaying average of past squared gradients like Adadelta or RMSprop, Adam also keeps an exponentially decaying average of past gradients , similar to SGD with momentum. [2]

is an estimate of the first moment (the mean) and is the estimate of the second moment (the uncentered variance) of the gradients respectively. As and are initialized as vectors of 0's, the authors of Adam observe that they are biased towards zero, especially during the initial time steps, and especially when the decay rates are small (i.e. and are close to 1). [2]