001     1053126
005     20260130140312.0
024 7 _ |a arXiv:2502.01247
|2 arXiv
037 _ _ |a FZJ-2026-01459
088 _ _ |a arXiv:2502.01247
|2 arXiv
100 1 _ |a Khalfaoui, Ismail
|0 P:(DE-Juel1)199801
|b 0
|u fzj
245 _ _ |a Polynomial, trigonometric, and tropical activations
260 _ _ |c 2025
|b arXiv
336 7 _ |a Preprint
|b preprint
|m preprint
|0 PUB:(DE-HGF)25
|s 1769766704_29794
|2 PUB:(DE-HGF)
336 7 _ |a WORKING_PAPER
|2 ORCID
336 7 _ |a Electronic Article
|0 28
|2 EndNote
336 7 _ |a preprint
|2 DRIVER
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a Output Types/Working Paper
|2 DataCite
520 _ _ |a Which functions can be used as activations in deep neural networks? This article explores families of functions based on orthonormal bases, including the Hermite polynomial basis and the Fourier trigonometric basis, as well as a basis resulting from the tropicalization of a polynomial basis. Our study shows that, through simple variance-preserving initialization and without additional clamping mechanisms, these activations can successfully be used to train deep models, such as GPT-2 for next-token prediction on OpenWebText and ConvNeXt for image classification on ImageNet. Our work addresses the issue of exploding and vanishing activations and gradients, particularly prevalent with polynomial activations, and opens the door for improving the efficiency of large-scale learning tasks. Furthermore, our approach provides insight into the structure of neural networks, revealing that networks with polynomial activations can be interpreted as multivariate polynomial mappings. Finally, using Hermite interpolation, we show that our activations can closely approximate classical ones in pre-trained models by matching both the function and its derivative, making them especially useful for fine-tuning tasks. These activations are available in the torchortho library, which can be accessed via: https://github.com/K-H-Ismail/torchortho.
536 _ _ |a Helmholtz AI Consultant Team FB Information (E54.303.11)
|0 G:(DE-Juel-1)E54.303.11
|c E54.303.11
|x 0
536 _ _ |a nxtAIM - nxtAIM – NXT GEN AI Methods (19A23014l)
|0 G:(BMWK)19A23014l
|c 19A23014l
|x 1
536 _ _ |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)
|0 G:(DE-HGF)POF4-5112
|c POF4-511
|f POF IV
|x 2
588 _ _ |a Dataset connected to arXivarXiv
650 _ 7 |a Machine Learning (cs.LG)
|2 Other
650 _ 7 |a Artificial Intelligence (cs.AI)
|2 Other
650 _ 7 |a Computation and Language (cs.CL)
|2 Other
650 _ 7 |a Computer Vision and Pattern Recognition (cs.CV)
|2 Other
650 _ 7 |a Algebraic Geometry (math.AG)
|2 Other
650 _ 7 |a FOS: Computer and information sciences
|2 Other
650 _ 7 |a FOS: Mathematics
|2 Other
700 1 _ |a Kesselheim, Stefan
|0 P:(DE-Juel1)185654
|b 1
856 4 _ |u https://juser.fz-juelich.de/record/1053126/files/2502.01247v2.pdf
|y Restricted
909 C O |o oai:juser.fz-juelich.de:1053126
|p extern4vita
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)199801
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 1
|6 P:(DE-Juel1)185654
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-511
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Enabling Computational- & Data-Intensive Science and Engineering
|9 G:(DE-HGF)POF4-5112
|x 0
980 1 _ |a EXTERN4VITA
980 _ _ |a preprint
980 _ _ |a EDITORS
980 _ _ |a I:(DE-Juel1)JSC-20090406


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21