Since the gradient of the error becomes shallower as the descent nears
convergence, this will naturally shrink the updates into the error
function's minimum. However, too large of a step size will lead to the
descent diverging and too small of a step size will lead to an extremely
long descent. Unfortunately, choosing a good step size is a matter of
trial and error.