Hidden Convexity of Deep Neural Networks: From Sparse Signal Processing to Geometric Algebra
Abstract
In this talk, we introduce an analysis of deep neural networks through convex optimization, sparse signal processing and geometric (Clifford) algebra. We begin by introducing exact convex optimization formulations for ReLU neural networks. This approach demonstrates that deep networks can be globally trained through convex programs, offering a globally optimal solution. Our results further establish an equivalent characterization of neural networks as high-dimensional convex Lasso models, traditionally employed in sparse signal processing. These models employ a discrete set of features and apply sparsity-inducing convex regularization to fit data. Our framework provides an intuitive geometric interpretation where the optimal neurons represent signed volumes of parallelotopes formed by data vectors. Specifically, we show that the Lasso signal dictionary is constructed from a discrete set of wedge products of input samples, with deeper network architectures leading to geometric reflections of these features. This analysis also reveals that standard convolutional neural networks can be globally optimized in fully polynomial time. Numerical simulations validate our claims, illustrating that the proposed convex approach is faster and more reliable than standard local search heuristics, such as stochastic gradient descent and its variants. We also discuss extensions to batch normalization, generative adversarial networks, transformers and diffusion models.
Bio
Mert Pilanci is an assistant professor of Electrical Engineering at
Stanford University. He received his Ph.D. in Electrical Engineering
and Computer Science from UC Berkeley in 2016. Prior to joining
Stanford, he was an assistant professor of Electrical Engineering and
Computer Science at the University of Michigan. In 2017, he was a
Math+X postdoctoral fellow working with Emmanuel Candès at Stanford
University. Mert’s research interests are in neural networks, machine
learning, optimization, and signal processing. His group develops
theory and algorithms for solving large scale optimization problems in
machine learning. His research also seeks to develop safe and
interpretable artificial intelligence and information theoretic
foundations of distributed computing.