000906961 001__ 906961
000906961 005__ 20240313103131.0
000906961 0247_ $$2arXiv$$aarXiv:2203.11355
000906961 0247_ $$2doi$$a10.48550/arXiv.2203.11355
000906961 0247_ $$2Handle$$a2128/31010
000906961 037__ $$aFZJ-2022-01779
000906961 088__ $$2arXiv$$aarXiv:2203.11355
000906961 1001_ $$0P:(DE-Juel1)171384$$aKeup, Christian$$b0$$eCorresponding author$$ufzj
000906961 245__ $$aOrigami in N dimensions: How feed-forward networks manufacture linear separability
000906961 260__ $$barXiv$$c2022
000906961 3367_ $$0PUB:(DE-HGF)25$$2PUB:(DE-HGF)$$aPreprint$$bpreprint$$mpreprint$$s1649410110_12135
000906961 3367_ $$2ORCID$$aWORKING_PAPER
000906961 3367_ $$028$$2EndNote$$aElectronic Article
000906961 3367_ $$2DRIVER$$apreprint
000906961 3367_ $$2BibTeX$$aARTICLE
000906961 3367_ $$2DataCite$$aOutput Types/Working Paper
000906961 520__ $$aNeural networks can implement arbitrary functions. But, mechanistically, what are the tools at their disposal to construct the target? For classification tasks, the network must transform the data classes into a linearly separable representation in the final hidden layer. We show that a feed-forward architecture has one primary tool at hand to achieve this separability: progressive folding of the data manifold in unoccupied higher dimensions. The operation of folding provides a useful intuition in low-dimensions that generalizes to high ones. We argue that an alternative method based on shear, requiring very deep architectures, plays only a small role in real-world networks. The folding operation, however, is powerful as long as layers are wider than the data dimensionality, allowing efficient solutions by providing access to arbitrary regions in the distribution, such as data points of one class forming islands within the other classes. We argue that a link exists between the universal approximation property in ReLU networks and the fold-and-cut theorem (Demaine et al., 1998) dealing with physical paper folding. Based on the mechanistic insight, we predict that the progressive generation of separability is necessarily accompanied by neurons showing mixed selectivity and bimodal tuning curves. This is validated in a network trained on the poker hand task, showing the emergence of bimodal tuning curves during training. We hope that our intuitive picture of the data transformation in deep networks can help to provide interpretability, and discuss possible applications to the theory of convolutional networks, loss landscapes, and generalization. TL;DR: Shows that the internal processing of deep networks can be thought of as literal folding operations on the data distribution in the N-dimensional activation space. A link to a well-known theorem in origami theory is provided.
000906961 536__ $$0G:(DE-HGF)POF4-5232$$a5232 - Computational Principles (POF4-523)$$cPOF4-523$$fPOF IV$$x0
000906961 536__ $$0G:(DE-Juel-1)BMBF-01IS19077A$$aRenormalizedFlows - Transparent Deep Learning with Renormalized Flows (BMBF-01IS19077A)$$cBMBF-01IS19077A$$x1
000906961 536__ $$0G:(DE-82)EXS-SF-neuroIC002$$aneuroIC002 - Recurrence and stochasticity for neuro-inspired computation (EXS-SF-neuroIC002)$$cEXS-SF-neuroIC002$$x2
000906961 536__ $$0G:(DE-Juel-1)PF-JARA-SDS005$$aSDS005 - Towards an integrated data science of complex natural systems (PF-JARA-SDS005)$$cPF-JARA-SDS005$$x3
000906961 536__ $$0G:(GEPRIS)368482240$$aGRK 2416 - GRK 2416: MultiSenses-MultiScales: Neue Ansätze zur Aufklärung neuronaler multisensorischer Integration (368482240)$$c368482240$$x4
000906961 588__ $$aDataset connected to arXivarXiv
000906961 650_7 $$2Other$$aMachine Learning (cs.LG)
000906961 650_7 $$2Other$$aDisordered Systems and Neural Networks (cond-mat.dis-nn)
000906961 650_7 $$2Other$$aMachine Learning (stat.ML)
000906961 650_7 $$2Other$$aFOS: Computer and information sciences
000906961 650_7 $$2Other$$aFOS: Physical sciences
000906961 7001_ $$0P:(DE-Juel1)144806$$aHelias, Moritz$$b1$$ufzj
000906961 773__ $$a10.48550/arXiv.2203.11355
000906961 8564_ $$uhttps://juser.fz-juelich.de/record/906961/files/Keup2022%20-%20Origami%20in%20N%20dimensions_%20How%20feed-forward%20networks%20manufacture%20linear%20separability.pdf$$yOpenAccess
000906961 909CO $$ooai:juser.fz-juelich.de:906961$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000906961 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000906961 9141_ $$y2022
000906961 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)171384$$aForschungszentrum Jülich$$b0$$kFZJ
000906961 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)144806$$aForschungszentrum Jülich$$b1$$kFZJ
000906961 9131_ $$0G:(DE-HGF)POF4-523$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5232$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vNeuromorphic Computing and Network Dynamics$$x0
000906961 920__ $$lyes
000906961 9201_ $$0I:(DE-Juel1)INM-6-20090406$$kINM-6$$lComputational and Systems Neuroscience$$x0
000906961 9201_ $$0I:(DE-Juel1)IAS-6-20130828$$kIAS-6$$lTheoretical Neuroscience$$x1
000906961 9201_ $$0I:(DE-Juel1)INM-10-20170113$$kINM-10$$lJara-Institut Brain structure-function relationships$$x2
000906961 9801_ $$aFullTexts
000906961 980__ $$apreprint
000906961 980__ $$aVDB
000906961 980__ $$aUNRESTRICTED
000906961 980__ $$aI:(DE-Juel1)INM-6-20090406
000906961 980__ $$aI:(DE-Juel1)IAS-6-20130828
000906961 980__ $$aI:(DE-Juel1)INM-10-20170113
000906961 981__ $$aI:(DE-Juel1)IAS-6-20130828