miniblog.

Wilfred Hughes

May 4, 2019 at 12:11

An excellent overview of the different gradient descent algorithms, and a nice example of content that is available as both a responsive website and a PDF on arXiv: https://ruder.io/optimizing-gradient-descent/

An overview of gradient descent optimization algorithms

Gradient descent is the preferred way to optimize neural networks and many other machine learning algorithms but is often used as a black box. This post explores how many of the most popular gradient-based optimization algorithms such as Momentum, Adagrad, and Adam actually work.