001     1052361
005     20260127203442.0
024 7 _ |a 10.1145/3784828.3785255
|2 doi
024 7 _ |a 10.34734/FZJ-2026-00960
|2 datacite_doi
037 _ _ |a FZJ-2026-00960
041 _ _ |a English
100 1 _ |a Orland, Fabian
|0 0000-0002-8681-2661
|b 0
|e Corresponding author
111 2 _ |a SCA/HPCAsia 2026 Workshops: Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops
|g SCA/HPCAsia 2026
|c Osaka
|d 2026-01-26 - 2026-01-29
|w Japan
245 _ _ |a Hybrid Inference Optimization for AI-Enhanced Turbulent Boundary Layer Simulation on Heterogeneous Systems
260 _ _ |a New York, NY, USA
|c 2026
|b ACM
295 1 0 |a Proceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops
300 _ _ |a 165-176
336 7 _ |a CONFERENCE_PAPER
|2 ORCID
336 7 _ |a Conference Paper
|0 33
|2 EndNote
336 7 _ |a INPROCEEDINGS
|2 BibTeX
336 7 _ |a conferenceObject
|2 DRIVER
336 7 _ |a Output Types/Conference Paper
|2 DataCite
336 7 _ |a Contribution to a conference proceedings
|b contrib
|m contrib
|0 PUB:(DE-HGF)8
|s 1769497819_26304
|2 PUB:(DE-HGF)
336 7 _ |a Contribution to a book
|0 PUB:(DE-HGF)7
|2 PUB:(DE-HGF)
|m contb
520 _ _ |a Active drag reduction (ADR) using spanwise traveling surface waves is a promising approach to reduce drag of airplanes by manipulating the turbulent boundary layer (TBL) around an airfoil, which directly translates into power savings and lower emission of greenhouse gases harming the environment. However, no analytical solution is known to determine the optimal actuation parameters of these surface waves based on given flow conditions. Data-driven deep learning (DL) techniques from artificial intelligence (AI) area promising alterna tive approach, but their training requires a huge amount of high-fidelity data from computationally expensive computational fluid dynamics (CFD) simulations. Previous works proposed a TBL-Transformer architecture for the expensive time-marching of turbulent flow fields and coupled it with a finite volume solver from the multi-physics PDE solver framework m-AIA to accelerate the generation of TBL data. To accelerate the computationally expensive inference of the TBL-Transformer, the AIxeleratorService library was used to offload the inference task to GPUs. While this approach significantly accelerates the inference task, it leaves the CPU resources allocated by the solver unutilized during inference. To fully exploit modern heterogeneous computer systems, we introduce a hybrid inference method based on a hybrid work distribution model and implement it into the AIxeleratorService library. Moreover, we present a formal model to derive the optimal hybrid work distribution. To evaluate the computational performance and scalability of hybrid inference, we benchmark the coupled m-AIA solver from previous work on a heterogeneous HPC system comprising Intel Sapphire Rapids CPUs and NVIDIA H100 GPUs. Our results show that hybrid inference achieves a performance speedup, that grows as the ratio of allocated CPU cores to GPU devices increases. We further demonstrate that the runtime improvement by hybrid inference also increases the energy efficiency of the coupled solver application. Finally, we highlight that the theoretical hybrid work distribution derived from our formal model yields near optimal results in practice.
536 _ _ |a 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)
|0 G:(DE-HGF)POF4-5111
|c POF4-511
|f POF IV
|x 0
536 _ _ |a SDLFSE - SDL Fluids & Solids Engineering (SDLFSE)
|0 G:(DE-Juel-1)SDLFSE
|c SDLFSE
|x 1
536 _ _ |a RAISE - Research on AI- and Simulation-Based Engineering at Exascale (951733)
|0 G:(EU-Grant)951733
|c 951733
|f H2020-INFRAEDI-2019-1
|x 2
588 _ _ |a Dataset connected to CrossRef Conference
700 1 _ |a Hilgers, Tom
|0 0000-0002-7501-3936
|b 1
700 1 _ |a Hübenthal, Fabian
|0 0009-0000-7159-8220
|b 2
700 1 _ |a Sarma, Rakesh
|0 P:(DE-Juel1)188513
|b 3
|u fzj
700 1 _ |a Lintermann, Andreas
|0 P:(DE-Juel1)165948
|b 4
|u fzj
700 1 _ |a Terboven, Christian
|0 P:(DE-HGF)0
|b 5
770 _ _ |z 9798400723285
773 _ _ |a 10.1145/3784828.3785255
856 4 _ |u https://juser.fz-juelich.de/record/1052361/files/MMCP_2026_Orland_et_al_authorversion.pdf
|y OpenAccess
909 C O |o oai:juser.fz-juelich.de:1052361
|p openaire
|p open_access
|p driver
|p VDB
|p ec_fundedresources
|p dnbdelivery
910 1 _ |a RWTH Aachen
|0 I:(DE-588b)36225-6
|k RWTH
|b 0
|6 0000-0002-8681-2661
910 1 _ |a RWTH Aachen
|0 I:(DE-588b)36225-6
|k RWTH
|b 1
|6 0000-0002-7501-3936
910 1 _ |a RWTH Aachen
|0 I:(DE-588b)36225-6
|k RWTH
|b 2
|6 0009-0000-7159-8220
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)188513
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 4
|6 P:(DE-Juel1)165948
910 1 _ |a RWTH Aachen
|0 I:(DE-588b)36225-6
|k RWTH
|b 5
|6 P:(DE-HGF)0
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-511
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Enabling Computational- & Data-Intensive Science and Engineering
|9 G:(DE-HGF)POF4-5111
|x 0
914 1 _ |y 2026
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a contrib
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a contb
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21