001031520 001__ 1031520
001031520 005__ 20241105214338.0
001031520 0247_ $$2doi$$a10.3389/fhpcp.2024.1444337
001031520 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-05715
001031520 037__ $$aFZJ-2024-05715
001031520 041__ $$aEnglish
001031520 1001_ $$0P:(DE-Juel1)188513$$aSarma, Rakesh$$b0$$eCorresponding author$$ufzj
001031520 245__ $$aParallel and scalable AI in HPC systems for CFD applications and beyond
001031520 260__ $$c2024
001031520 3367_ $$2DRIVER$$aarticle
001031520 3367_ $$2DataCite$$aOutput Types/Journal article
001031520 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1730752278_15994
001031520 3367_ $$2BibTeX$$aARTICLE
001031520 3367_ $$2ORCID$$aJOURNAL_ARTICLE
001031520 3367_ $$00$$2EndNote$$aJournal Article
001031520 500__ $$aMissing Journal: Frontiers in High Performance Computing (Front. High Perform. Comput.) = 2813-7337 (import from CrossRef, Journals: juser.fz-juelich.de); Please add the journal to the list of journals
001031520 520__ $$aThis manuscript presents the library AI4HPC with its architecture and components. The library enables large-scale trainings of AI models on High-Performance Computing systems. It addresses challenges in handling non-uniform datasets through data manipulation routines, model complexity through specialized ML architectures, scalability through extensive code optimizations that augment performance, HyperParameter Optimization (HPO), and performance monitoring. The scalability of the library is demonstrated by strong scaling experiments on up to 3,664 Graphical Processing Units (GPUs) resulting in a scaling efficiency of 96%, using the performance on 1 node as baseline. Furthermore, code optimizations and communication/computation bottlenecks are discussed for training a neural network on an actuated Turbulent Boundary Layer (TBL) simulation dataset (8.3 TB) on the HPC system JURECA at the Jülich Supercomputing Centre. The distributed training approach significantly influences the accuracy, which can be drastically compromised by varying mini-batch sizes. Therefore, AI4HPC implements learning rate scaling and adaptive summation algorithms, which are tested and evaluated in this work. For the TBL use case, results scaled up to 64 workers are shown. A further increase in the number of workers causes an additional overhead due to too small dataset samples per worker. Finally, the library is applied for the reconstruction of TBL flows with a convolutional autoencoder-based architecture and a diffusion model. In case of the autoencoder, a modal decomposition shows that the network provides accurate reconstructions of the underlying field and achieves a mean drag prediction error of ≈5%. With the diffusion model, a reconstruction error of ≈4% is achieved when super-resolution is applied to 5-fold coarsened velocity fields. The AI4HPC library is agnostic to the underlying network and can be adapted across various scientific and technical disciplines.
001031520 536__ $$0G:(DE-HGF)POF4-5111$$a5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001031520 536__ $$0G:(EU-Grant)951733$$aRAISE - Research on AI- and Simulation-Based Engineering at Exascale (951733)$$c951733$$fH2020-INFRAEDI-2019-1$$x1
001031520 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de
001031520 7001_ $$0P:(DE-Juel1)188268$$aInanc, Eray$$b1
001031520 7001_ $$0P:(DE-Juel1)180916$$aAach, Marcel$$b2
001031520 7001_ $$0P:(DE-Juel1)165948$$aLintermann, Andreas$$b3
001031520 773__ $$a10.3389/fhpcp.2024.1444337$$gVol. 2, p. 1444337$$p1444337$$v2$$y2024
001031520 8564_ $$uhttps://www.frontiersin.org/articles/10.3389/fhpcp.2024.1444337/full
001031520 8564_ $$uhttps://juser.fz-juelich.de/record/1031520/files/fhpcp-02-1444337.pdf$$yOpenAccess
001031520 8564_ $$uhttps://juser.fz-juelich.de/record/1031520/files/fhpcp-02-1444337.gif?subformat=icon$$xicon$$yOpenAccess
001031520 8564_ $$uhttps://juser.fz-juelich.de/record/1031520/files/fhpcp-02-1444337.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
001031520 8564_ $$uhttps://juser.fz-juelich.de/record/1031520/files/fhpcp-02-1444337.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
001031520 8564_ $$uhttps://juser.fz-juelich.de/record/1031520/files/fhpcp-02-1444337.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
001031520 8767_ $$d2024-10-14$$eAPC$$jDeposit$$z$ 1806.25
001031520 909CO $$ooai:juser.fz-juelich.de:1031520$$popenaire$$popen_access$$pOpenAPC$$pdriver$$pVDB$$pec_fundedresources$$popenCost$$pdnbdelivery
001031520 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188513$$aForschungszentrum Jülich$$b0$$kFZJ
001031520 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188268$$aForschungszentrum Jülich$$b1$$kFZJ
001031520 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)180916$$aForschungszentrum Jülich$$b2$$kFZJ
001031520 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)165948$$aForschungszentrum Jülich$$b3$$kFZJ
001031520 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5111$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001031520 9141_ $$y2024
001031520 915pc $$0PC:(DE-HGF)0000$$2APC$$aAPC keys set
001031520 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001031520 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
001031520 920__ $$lyes
001031520 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001031520 980__ $$ajournal
001031520 980__ $$aVDB
001031520 980__ $$aUNRESTRICTED
001031520 980__ $$aI:(DE-Juel1)JSC-20090406
001031520 980__ $$aAPC
001031520 9801_ $$aAPC
001031520 9801_ $$aFullTexts