Understanding Large ML Models through the Structure of Feature Covariance
Prof. Samet Oymak, Department of Electrical and Computer Engineering, UCRAn overarching goal in machine learning is to enable accurate statistical inference in the setting where the sample size is less than the number of parameters. This overparameterized setting is particularly common in deep learning where it is typical to train large neural nets with relatively smaller sample sizes and little concern of overfitting. In this talk, we highlight how structure within data is a catalyst for the empirical success of these large models. After linking deep nets to linear models, we show that the eigen-structure of the feature covariance can help explain empirical phenomena such as noise robustness, double descent curve, model compression, and the benefits of perfectly-fitting to the training data. In particular, we highlight that a typical feature covariance has a spiked structure with few large eigenvalues and many smaller ones. We proceed to discuss: (1) For data with label noise: Regularization is useful to restrict the optimization process to large eigen-directions and reduce overfitting, and (2) For (mostly) noiseless data: Incorporating small eigen-directions is crucial for striking a good bias/variance tradeoff. This in turn explains why larger models work better despite perfect-fitting with no regularization. Finally, we explain how our high-dimensional analysis framework based on gaussian process theory facilitates these findings.