001052361 001__ 1052361
001052361 005__ 20260127203442.0
001052361 0247_ $$2doi$$a10.1145/3784828.3785255
001052361 0247_ $$2datacite_doi$$a10.34734/FZJ-2026-00960
001052361 037__ $$aFZJ-2026-00960
001052361 041__ $$aEnglish
001052361 1001_ $$00000-0002-8681-2661$$aOrland, Fabian$$b0$$eCorresponding author
001052361 1112_ $$aSCA/HPCAsia 2026 Workshops: Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops$$cOsaka$$d2026-01-26 - 2026-01-29$$gSCA/HPCAsia 2026$$wJapan
001052361 245__ $$aHybrid Inference Optimization for AI-Enhanced Turbulent Boundary Layer Simulation on Heterogeneous Systems
001052361 260__ $$aNew York, NY, USA$$bACM$$c2026
001052361 29510 $$aProceedings of the Supercomputing Asia and International Conference on High Performance Computing in Asia Pacific Region Workshops
001052361 300__ $$a165-176
001052361 3367_ $$2ORCID$$aCONFERENCE_PAPER
001052361 3367_ $$033$$2EndNote$$aConference Paper
001052361 3367_ $$2BibTeX$$aINPROCEEDINGS
001052361 3367_ $$2DRIVER$$aconferenceObject
001052361 3367_ $$2DataCite$$aOutput Types/Conference Paper
001052361 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1769497819_26304
001052361 3367_ $$0PUB:(DE-HGF)7$$2PUB:(DE-HGF)$$aContribution to a book$$mcontb
001052361 520__ $$aActive drag reduction (ADR) using spanwise traveling surface waves is a promising approach to reduce drag of airplanes by manipulating the turbulent boundary layer (TBL) around an airfoil, which directly translates into power savings and lower emission of greenhouse gases harming the environment. However, no analytical solution is known to determine the optimal actuation parameters of these surface waves based on given flow conditions. Data-driven deep learning (DL) techniques from artificial intelligence (AI) area promising alterna tive approach, but their training requires a huge amount of high-fidelity data from computationally expensive computational fluid dynamics (CFD) simulations. Previous works proposed a TBL-Transformer architecture for the expensive time-marching of turbulent flow fields and coupled it with a finite volume solver from the multi-physics PDE solver framework m-AIA to accelerate the generation of TBL data. To accelerate the computationally expensive inference of the TBL-Transformer, the AIxeleratorService library was used to offload the inference task to GPUs. While this approach significantly accelerates the inference task, it leaves the CPU resources allocated by the solver unutilized during inference. To fully exploit modern heterogeneous computer systems, we introduce a hybrid inference method based on a hybrid work distribution model and implement it into the AIxeleratorService library. Moreover, we present a formal model to derive the optimal hybrid work distribution. To evaluate the computational performance and scalability of hybrid inference, we benchmark the coupled m-AIA solver from previous work on a heterogeneous HPC system comprising Intel Sapphire Rapids CPUs and NVIDIA H100 GPUs. Our results show that hybrid inference achieves a performance speedup, that grows as the ratio of allocated CPU cores to GPU devices increases. We further demonstrate that the runtime improvement by hybrid inference also increases the energy efficiency of the coupled solver application. Finally, we highlight that the theoretical hybrid work distribution derived from our formal model yields near optimal results in practice.
001052361 536__ $$0G:(DE-HGF)POF4-5111$$a5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001052361 536__ $$0G:(DE-Juel-1)SDLFSE$$aSDLFSE - SDL Fluids & Solids Engineering (SDLFSE)$$cSDLFSE$$x1
001052361 536__ $$0G:(EU-Grant)951733$$aRAISE - Research on AI- and Simulation-Based Engineering at Exascale (951733)$$c951733$$fH2020-INFRAEDI-2019-1$$x2
001052361 588__ $$aDataset connected to CrossRef Conference
001052361 7001_ $$00000-0002-7501-3936$$aHilgers, Tom$$b1
001052361 7001_ $$00009-0000-7159-8220$$aHübenthal, Fabian$$b2
001052361 7001_ $$0P:(DE-Juel1)188513$$aSarma, Rakesh$$b3$$ufzj
001052361 7001_ $$0P:(DE-Juel1)165948$$aLintermann, Andreas$$b4$$ufzj
001052361 7001_ $$0P:(DE-HGF)0$$aTerboven, Christian$$b5
001052361 770__ $$z9798400723285
001052361 773__ $$a10.1145/3784828.3785255
001052361 8564_ $$uhttps://juser.fz-juelich.de/record/1052361/files/MMCP_2026_Orland_et_al_authorversion.pdf$$yOpenAccess
001052361 909CO $$ooai:juser.fz-juelich.de:1052361$$popenaire$$popen_access$$pdriver$$pVDB$$pec_fundedresources$$pdnbdelivery
001052361 9101_ $$0I:(DE-588b)36225-6$$60000-0002-8681-2661$$aRWTH Aachen$$b0$$kRWTH
001052361 9101_ $$0I:(DE-588b)36225-6$$60000-0002-7501-3936$$aRWTH Aachen$$b1$$kRWTH
001052361 9101_ $$0I:(DE-588b)36225-6$$60009-0000-7159-8220$$aRWTH Aachen$$b2$$kRWTH
001052361 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)188513$$aForschungszentrum Jülich$$b3$$kFZJ
001052361 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)165948$$aForschungszentrum Jülich$$b4$$kFZJ
001052361 9101_ $$0I:(DE-588b)36225-6$$6P:(DE-HGF)0$$aRWTH Aachen$$b5$$kRWTH
001052361 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5111$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001052361 9141_ $$y2026
001052361 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001052361 920__ $$lyes
001052361 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001052361 980__ $$acontrib
001052361 980__ $$aVDB
001052361 980__ $$aUNRESTRICTED
001052361 980__ $$acontb
001052361 980__ $$aI:(DE-Juel1)JSC-20090406
001052361 9801_ $$aFullTexts