Critical feature learning in deep neural networks

Fischer, Kirsten; Lindner, Javed; Dahmen, David; Helias, Moritz; Ringel, Zohar; Krämer, Michael

%0 Conference Paper
%A Fischer, Kirsten
%A Lindner, Javed
%A Dahmen, David
%A Ringel, Zohar
%A Krämer, Michael
%A Helias, Moritz
%T Critical feature learning in deep neural networks
%M FZJ-2024-05061
%D 2024
%X A key property of neural networks driving their success is their ability to learn features from data. Understanding feature learning from a theoretical viewpoint is an emerging field with many open questions. In this work we capture finite-width effects with a systematic theory of network kernels in deep non-linear neural networks. We show that the Bayesian prior of the network can be written in closed form as a superposition of Gaussian processes, whose kernels are distributed with a variance that depends inversely on the network width N . A large deviation approach, which is exact in the proportional limit for the number of data points P=αN→∞, yields a pair of forward-backward equations for the maximum a posteriori kernels in all layers at once. We study their solutions perturbatively to demonstrate how the backward propagation across layers aligns kernels with the target. An alternative field-theoretic formulation shows that kernel adaptation of the Bayesian posterior at finite-width results from fluctuations in the prior: larger fluctuations correspond to a more flexible network prior and thus enable stronger adaptation to data. We thus find a bridge between the classical edge-of-chaos NNGP theory and feature learning, exposing an intricate interplay between criticality, response functions, and feature scale.
%B The Forty-first International Conference on Machine Learning
%C 21 Jul 2024 - 27 Jul 2024, Wien (Austria)
Y2 21 Jul 2024 - 27 Jul 2024
M2 Wien, Austria
%F PUB:(DE-HGF)24
%9 Poster
%U https://juser.fz-juelich.de/record/1029334

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help