diff --git a/_projects/2024-08-16-project-6.md b/_projects/2024-08-16-project-6.md index 4ac25bd42eab4..31ed7990a0d63 100644 --- a/_projects/2024-08-16-project-6.md +++ b/_projects/2024-08-16-project-6.md @@ -11,14 +11,5 @@ date: 2024-08-16 location: "Seattle, WA, USA" --- -Advances in large-scale machine learning have been driven by simple and lightweight iterative algorithms. This dissertation explores a broad class of such algorithms that operate on large random matrices, where matrix coordinate processes interact through mean-field dynamics. Examples of such algorithms include stochastic gradient-based methods for optimizing the weight matrices of deep neural networks (DNNs), Monte Carlo Markov Chain (MCMC) -algorithms for sampling from random matrix models, and the forward pass algorithm in DNNs with weight matrices at each layer. - -We demonstrate that, under mild assumptions, iterative algorithms and dynamics on large finite-dimensional matrices exhibit well-defined analytical scaling limits as the algorithm step-sizes approach zero and the dimensionality of the matrix-valued iterates grows to infinity. These scaling limits can be described as processes on infinite exchangeable arrays (IEAs) and analytically characterized as smooth curves on the metric space of graphons and measure-valued graphons (MVGs). The scaling limit of the process can also be described via McKean-Vlasov type stochastic differential equations (SDEs), similar to those studied in the theory of interacting particle systems. - -In deriving these findings, we develop a theory of gradient flows on graphons. We introduce new metrics on the space of MVGs that provide a natural notion of convergence for our limit theorems, equivalent to the convergence of IEAs. The analysis reveals that the scaling limits of popular algorithms like stochastic gradient descent (SGD) and an MCMC sampling algorithm coincide and are gradient flows on the space of graphons, uncovering an interesting connection between sampling and optimization. The analysis also demonstrates the propagation of chaos phenomenon in large-scale systems, indicating that as the system size grows, the coordinate evolutions become statistically independent. - -Finally, we apply these analytical tools to analyze the feedforward dynamics in a linear residual neural network as its depth and width grow to infinity. We again find the propagation of chaos phenomenon at play, demonstrating that as the network size grows, the evolution of any finite set of independently chosen neurons, from the input layer to the output layer, for any fixed input, becomes independent. Moreover, this neuron evolution can be described as a Gaussian process, with drift and diffusion components fully determined by the weights of the limiting network. This allows us to provide an optimal control framework for the risk minimization problem in such infinitely deep and wide networks. - This work has been accepted as my Ph.D. thesis at the [University of Washington](https://www.washington.edu/){:target="_blank"}. Please find the full text of the thesis [here](https://raghavsomani.github.io/projects/files/thesis.pdf){:target="_blank"}.