Register now After registration you will be able to apply for this opportunity online.
This opportunity is not published. No applications will be accepted.
Implementation and Analysis of the Stochastic Generalized Gauss-Newton Method for Training Deep Neural Networks
Deep neural networks (DNNs) are powerful mathematical models capable of learning arbitrarily complex functions given a sufficiently representative set of data points. In the recent decade, DNNs have been successfully deployed in many different applications, ranging from computer vision to finance.
Despite their many success stories, the training of deep neural networks is still a difficult task as it requires the solution of a large-scale nonconvex unconstrained optimization problem with an unknown and complex landscape. This task is typically addressed with stochastic first-order methods: the workhorse optimization algorithms used for training are indeed stochastic gradient descent and its variants, which ensure convergence to a stationary point under certain conditions.
These methods consist of computationally cheap iterations, but the generated iterates generally progress very slowly towards a stationary point, especially in the presence of pathological curvature regions. In addition, they require time- and resource-demanding tuning procedures for adjusting the learning rate value in the course of the optimization. Second-order methods, which include curvature information in their update rules, would mitigate the effect of poor-conditioning and sensitivity to the hyperparameters, but at the price of computationally more expensive, if not intractable, iterations. In addition, direct implementations of these methods would not work given the size of the considered problems as well as the size of the modern datasets. The deployment of second-order methods for training state-of-the-art DNNs is therefore still an open problem.
Despite their many success stories, the training of deep neural networks is still a difficult task as it requires the solution of a large-scale nonconvex unconstrained optimization problem with an unknown and complex landscape. This task is typically addressed with stochastic first-order methods: the workhorse optimization algorithms used for training are indeed stochastic gradient descent and its variants, which ensure convergence to a stationary point under certain conditions. These methods consist of computationally cheap iterations, but the generated iterates generally progress very slowly towards a stationary point, especially in the presence of pathological curvature regions. In addition, they require time- and resource-demanding tuning procedures for adjusting the learning rate value in the course of the optimization. Second-order methods, which include curvature information in their update rules, would mitigate the effect of poor-conditioning and sensitivity to the hyperparameters, but at the price of computationally more expensive, if not intractable, iterations. In addition, direct implementations of these methods would not work given the size of the considered problems as well as the size of the modern datasets. The deployment of second-order methods for training state-of-the-art DNNs is therefore still an open problem.
The goal of this project is to investigate the use of second-order methods for training neural networks. In particular, the project is focused on the stochastic generalized Gauss-Newton (SGN) method and the main task consists in a flexible, robust and easy-to-use implementation of this method in JAX, the new machine learning framework developed by Google Research teams.
The goal of this project is to investigate the use of second-order methods for training neural networks. In particular, the project is focused on the stochastic generalized Gauss-Newton (SGN) method and the main task consists in a flexible, robust and easy-to-use implementation of this method in JAX, the new machine learning framework developed by Google Research teams.