Critical feature learning in deep neural networks

Fischer, Kirsten; Lindner, Javed; Dahmen, David; Helias, Moritz; Ringel, Zohar; Krämer, Michael
001029334 001__ 1029334
001029334 005__ 20241108205838.0
001029334 037__ $$aFZJ-2024-05061
001029334 1001_ $$0P:(DE-Juel1)180150$$aFischer, Kirsten$$b0$$eCorresponding author$$ufzj
001029334 1112_ $$aThe Forty-first International Conference on Machine Learning$$cWien$$d2024-07-21 - 2024-07-27$$wAustria
001029334 245__ $$aCritical feature learning in deep neural networks
001029334 260__ $$c2024
001029334 3367_ $$033$$2EndNote$$aConference Paper
001029334 3367_ $$2BibTeX$$aINPROCEEDINGS
001029334 3367_ $$2DRIVER$$aconferenceObject
001029334 3367_ $$2ORCID$$aCONFERENCE_POSTER
001029334 3367_ $$2DataCite$$aOutput Types/Conference Poster
001029334 3367_ $$0PUB:(DE-HGF)24$$2PUB:(DE-HGF)$$aPoster$$bposter$$mposter$$s1731048220_24645$$xAfter Call
001029334 520__ $$aA key property of neural networks driving their success is their ability to learn features from data. Understanding feature learning from a theoretical viewpoint is an emerging field with many open questions. In this work we capture finite-width effects with a systematic theory of network kernels in deep non-linear neural networks. We show that the Bayesian prior of the network can be written in closed form as a superposition of Gaussian processes, whose kernels are distributed with a variance that depends inversely on the network width N . A large deviation approach, which is exact in the proportional limit for the number of data points P=αN→∞, yields a pair of forward-backward equations for the maximum a posteriori kernels in all layers at once. We study their solutions perturbatively to demonstrate how the backward propagation across layers aligns kernels with the target. An alternative field-theoretic formulation shows that kernel adaptation of the Bayesian posterior at finite-width results from fluctuations in the prior: larger fluctuations correspond to a more flexible network prior and thus enable stronger adaptation to data. We thus find a bridge between the classical edge-of-chaos NNGP theory and feature learning, exposing an intricate interplay between criticality, response functions, and feature scale.
001029334 536__ $$0G:(DE-HGF)POF4-5232$$a5232 - Computational Principles (POF4-523)$$cPOF4-523$$fPOF IV$$x0
001029334 536__ $$0G:(DE-HGF)POF4-5234$$a5234 - Emerging NC Architectures (POF4-523)$$cPOF4-523$$fPOF IV$$x1
001029334 536__ $$0G:(DE-Juel-1)BMBF-01IS19077A$$aRenormalizedFlows - Transparent Deep Learning with Renormalized Flows (BMBF-01IS19077A)$$cBMBF-01IS19077A$$x2
001029334 536__ $$0G:(DE-Juel1)HGF-SMHB-2014-2018$$aMSNN - Theory of multi-scale neuronal networks (HGF-SMHB-2014-2018)$$cHGF-SMHB-2014-2018$$fMSNN$$x3
001029334 536__ $$0G:(DE-HGF)SO-092$$aACA - Advanced Computing Architectures (SO-092)$$cSO-092$$x4
001029334 7001_ $$0P:(DE-Juel1)185990$$aLindner, Javed$$b1$$eCorresponding author$$ufzj
001029334 7001_ $$0P:(DE-Juel1)156459$$aDahmen, David$$b2$$ufzj
001029334 7001_ $$0P:(DE-HGF)0$$aRingel, Zohar$$b3
001029334 7001_ $$0P:(DE-HGF)0$$aKrämer, Michael$$b4
001029334 7001_ $$0P:(DE-Juel1)144806$$aHelias, Moritz$$b5$$ufzj
001029334 909CO $$ooai:juser.fz-juelich.de:1029334$$pVDB
001029334 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)180150$$aForschungszentrum Jülich$$b0$$kFZJ
001029334 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)185990$$aForschungszentrum Jülich$$b1$$kFZJ
001029334 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)156459$$aForschungszentrum Jülich$$b2$$kFZJ
001029334 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)144806$$aForschungszentrum Jülich$$b5$$kFZJ
001029334 9131_ $$0G:(DE-HGF)POF4-523$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5232$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vNeuromorphic Computing and Network Dynamics$$x0
001029334 9131_ $$0G:(DE-HGF)POF4-523$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5234$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vNeuromorphic Computing and Network Dynamics$$x1
001029334 9141_ $$y2024
001029334 920__ $$lyes
001029334 9201_ $$0I:(DE-Juel1)IAS-6-20130828$$kIAS-6$$lComputational and Systems Neuroscience$$x0
001029334 980__ $$aposter
001029334 980__ $$aVDB
001029334 980__ $$aI:(DE-Juel1)IAS-6-20130828
001029334 980__ $$aUNRESTRICTED
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help