001053126 001__ 1053126
001053126 005__ 20260130140312.0
001053126 0247_ $$2arXiv$$aarXiv:2502.01247
001053126 037__ $$aFZJ-2026-01459
001053126 088__ $$2arXiv$$aarXiv:2502.01247
001053126 1001_ $$0P:(DE-Juel1)199801$$aKhalfaoui, Ismail$$b0$$ufzj
001053126 245__ $$aPolynomial, trigonometric, and tropical activations
001053126 260__ $$barXiv$$c2025
001053126 3367_ $$0PUB:(DE-HGF)25$$2PUB:(DE-HGF)$$aPreprint$$bpreprint$$mpreprint$$s1769766704_29794
001053126 3367_ $$2ORCID$$aWORKING_PAPER
001053126 3367_ $$028$$2EndNote$$aElectronic Article
001053126 3367_ $$2DRIVER$$apreprint
001053126 3367_ $$2BibTeX$$aARTICLE
001053126 3367_ $$2DataCite$$aOutput Types/Working Paper
001053126 520__ $$aWhich functions can be used as activations in deep neural networks? This article explores families of functions based on orthonormal bases, including the Hermite polynomial basis and the Fourier trigonometric basis, as well as a basis resulting from the tropicalization of a polynomial basis. Our study shows that, through simple variance-preserving initialization and without additional clamping mechanisms, these activations can successfully be used to train deep models, such as GPT-2 for next-token prediction on OpenWebText and ConvNeXt for image classification on ImageNet. Our work addresses the issue of exploding and vanishing activations and gradients, particularly prevalent with polynomial activations, and opens the door for improving the efficiency of large-scale learning tasks. Furthermore, our approach provides insight into the structure of neural networks, revealing that networks with polynomial activations can be interpreted as multivariate polynomial mappings. Finally, using Hermite interpolation, we show that our activations can closely approximate classical ones in pre-trained models by matching both the function and its derivative, making them especially useful for fine-tuning tasks. These activations are available in the torchortho library, which can be accessed via: https://github.com/K-H-Ismail/torchortho.
001053126 536__ $$0G:(DE-Juel-1)E54.303.11$$aHelmholtz AI Consultant Team FB Information (E54.303.11)$$cE54.303.11$$x0
001053126 536__ $$0G:(BMWK)19A23014l$$anxtAIM - nxtAIM – NXT GEN AI Methods (19A23014l)$$c19A23014l$$x1
001053126 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x2
001053126 588__ $$aDataset connected to arXivarXiv
001053126 650_7 $$2Other$$aMachine Learning (cs.LG)
001053126 650_7 $$2Other$$aArtificial Intelligence (cs.AI)
001053126 650_7 $$2Other$$aComputation and Language (cs.CL)
001053126 650_7 $$2Other$$aComputer Vision and Pattern Recognition (cs.CV)
001053126 650_7 $$2Other$$aAlgebraic Geometry (math.AG)
001053126 650_7 $$2Other$$aFOS: Computer and information sciences
001053126 650_7 $$2Other$$aFOS: Mathematics
001053126 7001_ $$0P:(DE-Juel1)185654$$aKesselheim, Stefan$$b1
001053126 8564_ $$uhttps://juser.fz-juelich.de/record/1053126/files/2502.01247v2.pdf$$yRestricted
001053126 909CO $$ooai:juser.fz-juelich.de:1053126$$pextern4vita
001053126 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)199801$$aForschungszentrum Jülich$$b0$$kFZJ
001053126 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)185654$$aForschungszentrum Jülich$$b1$$kFZJ
001053126 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001053126 9801_ $$aEXTERN4VITA
001053126 980__ $$apreprint
001053126 980__ $$aEDITORS
001053126 980__ $$aI:(DE-Juel1)JSC-20090406