Journal Article FZJ-2025-04891

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Field theory for optimal signal propagation in residual networks

 ;  ;

2025
Inst. Woodbury, NY

Physical review / E 112(6), 065301 () [10.1103/5lgz-4t7h]

This record in other databases:  

Please use a persistent id in citations: doi:  doi:

Abstract: Residual networks have significantly better trainability and thus performance than feed-forward networks at large depth. Introducing skip connections facilitates signal propagation to deeper layers. In addition, previous works found that adding a scaling parameter for the residual branch further improves generalization performance. While they empirically identified a particularly beneficial range of values for this scaling parameter, the mechanism for the resulting performance improvement and its universality across network hyperparameters remain an open question. For feed-forward networks, finite-size theories have led to important insights with regard to signal propagation and hyperparameter tuning. We here derive a systematic finite-size field theory for residual networks to study signal propagation and its dependence on the scaling for the residual branch. We derive analytical expressions for the response function, a measure for the network’s sensitivity to inputs, and show that for deep networks the empirically found values for the scaling parameter lie within the range of maximal sensitivity. Furthermore, we obtain an analytical expression for the optimal scaling parameter that depends only weakly on other network hyperparameters, such as the weight variance, thereby explaining its universality across hyperparameters. Overall, this work provides a theoretical framework to study ResNets at finite size.

Classification:

Contributing Institute(s):
  1. Computational and Systems Neuroscience (IAS-6)
Research Program(s):
  1. 5232 - Computational Principles (POF4-523) (POF4-523)
  2. 5234 - Emerging NC Architectures (POF4-523) (POF4-523)
  3. RenormalizedFlows - Transparent Deep Learning with Renormalized Flows (BMBF-01IS19077A) (BMBF-01IS19077A)
  4. MSNN - Theory of multi-scale neuronal networks (HGF-SMHB-2014-2018) (HGF-SMHB-2014-2018)
  5. ACA - Advanced Computing Architectures (SO-092) (SO-092)
  6. neuroIC002 - Recurrence and stochasticity for neuro-inspired computation (EXS-SF-neuroIC002) (EXS-SF-neuroIC002)
  7. DFG project G:(GEPRIS)491111487 - Open-Access-Publikationskosten / 2025 - 2027 / Forschungszentrum Jülich (OAPKFZJ) (491111487) (491111487)

Appears in the scientific report 2025
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; OpenAccess ; Clarivate Analytics Master Journal List ; Current Contents - Electronics and Telecommunications Collection ; Current Contents - Physical, Chemical and Earth Sciences ; Ebsco Academic Search ; Essential Science Indicators ; IF < 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Institute Collections > IAS > IAS-6
Workflow collections > Public records
Publications database
Open Access

 Record created 2025-12-02, last modified 2025-12-22


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)