| Home > Publications database > From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning > print |
| 001 | 1052069 | ||
| 005 | 20260121204317.0 | ||
| 037 | _ | _ | |a FZJ-2026-00739 |
| 041 | _ | _ | |a English |
| 100 | 1 | _ | |a Rubin, Noa |0 P:(DE-HGF)0 |b 0 |e Corresponding author |
| 111 | 2 | _ | |a The 42nd International Conference on Machine Learning |g ICML 2025 |c Vancouver |d 2025-07-13 - 2025-07-19 |w Canada |
| 245 | _ | _ | |a From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning |
| 260 | _ | _ | |c 2025 |
| 336 | 7 | _ | |a Conference Paper |0 33 |2 EndNote |
| 336 | 7 | _ | |a Other |2 DataCite |
| 336 | 7 | _ | |a INPROCEEDINGS |2 BibTeX |
| 336 | 7 | _ | |a conferenceObject |2 DRIVER |
| 336 | 7 | _ | |a LECTURE_SPEECH |2 ORCID |
| 336 | 7 | _ | |a Conference Presentation |b conf |m conf |0 PUB:(DE-HGF)6 |s 1768997418_9544 |2 PUB:(DE-HGF) |x After Call |
| 520 | _ | _ | |a Feature learning in neural networks is crucial fortheir expressive power and inductive biases, moti-vating various theoretical approaches. Some ap-proaches describe network behavior after train-ing through a change in kernel scale from initial-ization, resulting in a generalization power com-parable to a Gaussian process. Conversely, inother approaches training results in the adapta-tion of the kernel to the data, involving directionalchanges to the kernel. The relationship and re-spective strengths of these two views have so farremained unresolved. This work presents a theo-retical framework of multi-scale adaptive featurelearning bridging these two views. Using methodsfrom statistical mechanics, we derive analyticalexpressions for network output statistics whichare valid across scaling regimes and in the contin-uum between them. A systematic expansion ofthe network’s probability distribution reveals thatmean-field scaling requires only a saddle-pointapproximation, while standard scaling necessi-tates additional correction terms. Remarkably,we find across regimes that kernel adaptation canbe reduced to an effective kernel rescaling whenpredicting the mean network output in the spe-cial case of a linear network. However, for linearand non-linear networks, the multi-scale adaptiveapproach captures directional feature learning ef-fects, providing richer insights than what couldbe recovered from a rescaling of the kernel alone |
| 536 | _ | _ | |a 5232 - Computational Principles (POF4-523) |0 G:(DE-HGF)POF4-5232 |c POF4-523 |f POF IV |x 0 |
| 536 | _ | _ | |a 5234 - Emerging NC Architectures (POF4-523) |0 G:(DE-HGF)POF4-5234 |c POF4-523 |f POF IV |x 1 |
| 536 | _ | _ | |a MSNN - Theory of multi-scale neuronal networks (HGF-SMHB-2014-2018) |0 G:(DE-Juel1)HGF-SMHB-2014-2018 |c HGF-SMHB-2014-2018 |f MSNN |x 2 |
| 536 | _ | _ | |a ACA - Advanced Computing Architectures (SO-092) |0 G:(DE-HGF)SO-092 |c SO-092 |x 3 |
| 536 | _ | _ | |a GRK 2416 - GRK 2416: MultiSenses-MultiScales: Neue Ansätze zur Aufklärung neuronaler multisensorischer Integration (368482240) |0 G:(GEPRIS)368482240 |c 368482240 |x 4 |
| 650 | 2 | 7 | |a Others |0 V:(DE-MLZ)SciArea-250 |2 V:(DE-HGF) |x 0 |
| 700 | 1 | _ | |a Fischer, Kirsten |0 P:(DE-Juel1)180150 |b 1 |e Corresponding author |u fzj |
| 700 | 1 | _ | |a Lindner, Javed |0 P:(DE-Juel1)185990 |b 2 |e Corresponding author |u fzj |
| 700 | 1 | _ | |a Dahmen, David |0 P:(DE-Juel1)156459 |b 3 |u fzj |
| 700 | 1 | _ | |a Seroussi, Inbar |0 P:(DE-HGF)0 |b 4 |
| 700 | 1 | _ | |a Ringel, Zohar |0 P:(DE-HGF)0 |b 5 |
| 700 | 1 | _ | |a Michael, Krämer |0 P:(DE-HGF)0 |b 6 |
| 700 | 1 | _ | |a Helias, Moritz |0 P:(DE-Juel1)144806 |b 7 |u fzj |
| 856 | 4 | _ | |u https://icml.cc/virtual/2025/poster/44430 |
| 909 | C | O | |o oai:juser.fz-juelich.de:1052069 |p VDB |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 1 |6 P:(DE-Juel1)180150 |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 2 |6 P:(DE-Juel1)185990 |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 3 |6 P:(DE-Juel1)156459 |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 7 |6 P:(DE-Juel1)144806 |
| 913 | 1 | _ | |a DE-HGF |b Key Technologies |l Natural, Artificial and Cognitive Information Processing |1 G:(DE-HGF)POF4-520 |0 G:(DE-HGF)POF4-523 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Neuromorphic Computing and Network Dynamics |9 G:(DE-HGF)POF4-5232 |x 0 |
| 913 | 1 | _ | |a DE-HGF |b Key Technologies |l Natural, Artificial and Cognitive Information Processing |1 G:(DE-HGF)POF4-520 |0 G:(DE-HGF)POF4-523 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Neuromorphic Computing and Network Dynamics |9 G:(DE-HGF)POF4-5234 |x 1 |
| 920 | _ | _ | |l yes |
| 920 | 1 | _ | |0 I:(DE-Juel1)IAS-6-20130828 |k IAS-6 |l Computational and Systems Neuroscience |x 0 |
| 980 | _ | _ | |a conf |
| 980 | _ | _ | |a VDB |
| 980 | _ | _ | |a I:(DE-Juel1)IAS-6-20130828 |
| 980 | _ | _ | |a UNRESTRICTED |
| Library | Collection | CLSMajor | CLSMinor | Language | Author |
|---|