Technical Program

Paper Detail

Paper Title Linearly Convergent Algorithms for Learning Shallow Residual Networks
Paper IdentifierTH1.R3.1
Authors Gauri Jagatap, Chinmay Hegde, Iowa State University, United States
Session Information Theory and Learning II
Location Monge, Level 3
Session Time Thursday, 11 July, 09:50 - 11:10
Presentation Time Thursday, 11 July, 09:50 - 10:10
Manuscript  Click here to download the manuscript
Abstract We propose and analyze algorithms for training ReLU networks with skipped connections. Skipped connections are the key feature of residual networks (or ResNets) which have been shown to provide superior performance in deep learning applications. We analyze two approaches for training such networks: gradient descent and alternating minimization and compare convergence criteria of both methods. We show that under typical (Gaussianity) assumptions on the $d-$dimensional input data, both gradient descent and alternating minimization provably converge in a linearly convergent fashion, assuming any good enough initialization; moreover, we show that a simple ``identity'' initialization suffices. Furthermore, we provide statistical upper bounds which indicate that $n=\widetilde{O}(d^3)$ suffice to achieve this convergence rate. To our knowledge, these constitute the first global parameter recovery guarantees for shallow ResNet-type networks with ReLU activations.