Generalization of Neural Networks: A Hessian View
Expeditions in Experiential AI seminar with Hongyang Zhang
Abstract
The generalization property of neural networks is a central issue in understanding their learning capabilities and limitations. Researchers often focus on a specific neural network model, analyzing its sample complexity through a learning algorithm like SGD. Despite these studies, it's uncertain whether these insights hold for more complex data distributions and larger neural networks, such as foundation models. A complementary approach suitable for complex data and large models would be beneficial.
In a presentation, Hongyang 'Ryan' Zhang will introduce a Hessian perspective on this issue. The use of the Hessian in neural networks has roots in early research on second-order methods. During the talk, Zhang will detail a Hessian-based measure of generalization developed through the PAC-Bayes analysis framework, as noted at ICML’22. He will demonstrate how measuring the trace of the Hessian of the loss can provide substantial measures of generalization gaps. This approach also aids in crafting new optimization algorithms to smooth the sharpness of loss surfaces, a topic forthcoming in TMLR’24. Additionally, Zhang will reveal a new spectral-norm generalization bound for graph neural networks, enhancing previous results based on graph distribution's maximum degree, as discussed at AISTATS’23.
Bio
Hongyang Zhang is an assistant professor of computer science at Northeastern University, Boston. His research interests lie at the intersection of machine learning, design and analysis of algorithms, learning theory, data, networks, and language modeling. He received a PhD in Computer Science from Stanford University and a BE in Computer Science from Shanghai Jiao Tong University. He spent about a year as a postdoc in the statistics and data science department at the University of Pennsylvania. He has served as an area chair (and program committee member) for ICML, AISTATS, COLT, ALT, and AAAI, as well as an action editor for the Journal of Data-centric Machine Learning Research.