001     1031520
005     20241105214338.0
024 7 _ |a 10.3389/fhpcp.2024.1444337
|2 doi
024 7 _ |a 10.34734/FZJ-2024-05715
|2 datacite_doi
037 _ _ |a FZJ-2024-05715
041 _ _ |a English
100 1 _ |a Sarma, Rakesh
|0 P:(DE-Juel1)188513
|b 0
|e Corresponding author
|u fzj
245 _ _ |a Parallel and scalable AI in HPC systems for CFD applications and beyond
260 _ _ |c 2024
336 7 _ |a article
|2 DRIVER
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1730752278_15994
|2 PUB:(DE-HGF)
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a Journal Article
|0 0
|2 EndNote
500 _ _ |a Missing Journal: Frontiers in High Performance Computing (Front. High Perform. Comput.) = 2813-7337 (import from CrossRef, Journals: juser.fz-juelich.de); Please add the journal to the list of journals
520 _ _ |a This manuscript presents the library AI4HPC with its architecture and components. The library enables large-scale trainings of AI models on High-Performance Computing systems. It addresses challenges in handling non-uniform datasets through data manipulation routines, model complexity through specialized ML architectures, scalability through extensive code optimizations that augment performance, HyperParameter Optimization (HPO), and performance monitoring. The scalability of the library is demonstrated by strong scaling experiments on up to 3,664 Graphical Processing Units (GPUs) resulting in a scaling efficiency of 96%, using the performance on 1 node as baseline. Furthermore, code optimizations and communication/computation bottlenecks are discussed for training a neural network on an actuated Turbulent Boundary Layer (TBL) simulation dataset (8.3 TB) on the HPC system JURECA at the Jülich Supercomputing Centre. The distributed training approach significantly influences the accuracy, which can be drastically compromised by varying mini-batch sizes. Therefore, AI4HPC implements learning rate scaling and adaptive summation algorithms, which are tested and evaluated in this work. For the TBL use case, results scaled up to 64 workers are shown. A further increase in the number of workers causes an additional overhead due to too small dataset samples per worker. Finally, the library is applied for the reconstruction of TBL flows with a convolutional autoencoder-based architecture and a diffusion model. In case of the autoencoder, a modal decomposition shows that the network provides accurate reconstructions of the underlying field and achieves a mean drag prediction error of ≈5%. With the diffusion model, a reconstruction error of ≈4% is achieved when super-resolution is applied to 5-fold coarsened velocity fields. The AI4HPC library is agnostic to the underlying network and can be adapted across various scientific and technical disciplines.
536 _ _ |a 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)
|0 G:(DE-HGF)POF4-5111
|c POF4-511
|f POF IV
|x 0
536 _ _ |a RAISE - Research on AI- and Simulation-Based Engineering at Exascale (951733)
|0 G:(EU-Grant)951733
|c 951733
|f H2020-INFRAEDI-2019-1
|x 1
588 _ _ |a Dataset connected to CrossRef, Journals: juser.fz-juelich.de
700 1 _ |a Inanc, Eray
|0 P:(DE-Juel1)188268
|b 1
700 1 _ |a Aach, Marcel
|0 P:(DE-Juel1)180916
|b 2
700 1 _ |a Lintermann, Andreas
|0 P:(DE-Juel1)165948
|b 3
773 _ _ |a 10.3389/fhpcp.2024.1444337
|p 1444337
|y 2024
|g Vol. 2, p. 1444337
|v 2
856 4 _ |u https://www.frontiersin.org/articles/10.3389/fhpcp.2024.1444337/full
856 4 _ |u https://juser.fz-juelich.de/record/1031520/files/fhpcp-02-1444337.pdf
|y OpenAccess
856 4 _ |u https://juser.fz-juelich.de/record/1031520/files/fhpcp-02-1444337.gif?subformat=icon
|x icon
|y OpenAccess
856 4 _ |u https://juser.fz-juelich.de/record/1031520/files/fhpcp-02-1444337.jpg?subformat=icon-1440
|x icon-1440
|y OpenAccess
856 4 _ |u https://juser.fz-juelich.de/record/1031520/files/fhpcp-02-1444337.jpg?subformat=icon-180
|x icon-180
|y OpenAccess
856 4 _ |u https://juser.fz-juelich.de/record/1031520/files/fhpcp-02-1444337.jpg?subformat=icon-640
|x icon-640
|y OpenAccess
909 C O |o oai:juser.fz-juelich.de:1031520
|p openaire
|p open_access
|p OpenAPC
|p driver
|p VDB
|p ec_fundedresources
|p openCost
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)188513
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 1
|6 P:(DE-Juel1)188268
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 2
|6 P:(DE-Juel1)180916
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)165948
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-511
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Enabling Computational- & Data-Intensive Science and Engineering
|9 G:(DE-HGF)POF4-5111
|x 0
914 1 _ |y 2024
915 p c |a APC keys set
|0 PC:(DE-HGF)0000
|2 APC
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
915 _ _ |a Creative Commons Attribution CC BY 4.0
|0 LIC:(DE-HGF)CCBY4
|2 HGFVOC
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 _ _ |a APC
980 1 _ |a APC
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21