| Home > Publications database > From Kernels to Features: A Multi-Scale Adaptive Theory of Feature Learning |
| Conference Presentation (After Call) | FZJ-2026-00739 |
; ; ; ; ; ; ;
2025
Abstract: Feature learning in neural networks is crucial fortheir expressive power and inductive biases, moti-vating various theoretical approaches. Some ap-proaches describe network behavior after train-ing through a change in kernel scale from initial-ization, resulting in a generalization power com-parable to a Gaussian process. Conversely, inother approaches training results in the adapta-tion of the kernel to the data, involving directionalchanges to the kernel. The relationship and re-spective strengths of these two views have so farremained unresolved. This work presents a theo-retical framework of multi-scale adaptive featurelearning bridging these two views. Using methodsfrom statistical mechanics, we derive analyticalexpressions for network output statistics whichare valid across scaling regimes and in the contin-uum between them. A systematic expansion ofthe network’s probability distribution reveals thatmean-field scaling requires only a saddle-pointapproximation, while standard scaling necessi-tates additional correction terms. Remarkably,we find across regimes that kernel adaptation canbe reduced to an effective kernel rescaling whenpredicting the mean network output in the spe-cial case of a linear network. However, for linearand non-linear networks, the multi-scale adaptiveapproach captures directional feature learning ef-fects, providing richer insights than what couldbe recovered from a rescaling of the kernel alone
Keyword(s): Others (2nd)
|
The record appears in these collections: |