Momentum can accelerate training and learning rate schedules can help to converge the optimization process. Adaptive learning rates can accelerate training and alleviate some of the pressure of choosing a learning rate and learning rate schedule. 0.9 is the ideal value of it, and its represented by the notion β The left hand side equation in the below picture is better: