【通过梯度下降轨迹来理解深度学习优化】《Understanding optimization in deep learning by analyzing trajectories of gradient descent Off the convex path》.pdf

立即下载 作者: NetworkAttachedStorage
上传时间: 2025-04-18
关键词: loss dimensions linear wich ton Institute Advanced descent gradient deep
大小 448.7 KB
描述

   ar
X
iv
:1
81
0.
02
28
1v
1
 [
cs
.L
G
]
 4
 O
ct
 2
01
8
A CONVERGENCE ANALYSIS OF GRADIENT DESCENT
FOR DEEP LINEAR NEURAL NETWORKS
Sanjeev Arora
Princeton University and Institute for Advanced Study
arora@cs.princeton.edu
Nadav Cohen
Institute for Advanced Study
cohennadav@ias.edu
Noah Golowich
Harvard University
ngolowich@college.harvard.edu
Wei Hu
Princeton University
huwei@cs.princeton.edu
ABSTRACT
We analyze speed of convergence to global optimum for gradient descent training
a deep linear neural network (parameterized as x 7→WN · · ·W1x) by minimizing
the ℓ2 loss over whitened data. Convergence at a linear rate is guaranteed when the
following hold: (i) dimensions of hidden layers are at least the minimum of the in-
put and output dimensions; (ii) weight matrices at initialization are approximately
balanced; and (iii) the initial loss is smaller than the loss of any rank-deficient
solution. The assumptions on initialization (conditions

目录
loss/dimensions/linear/wich/ton/Institute/Advanced/descent/gradient/deep/ loss/dimensions/linear/wich/ton/Institute/Advanced/descent/gradient/deep/

-1 条回复

登录后才能参与评论