001026442 001__ 1026442 001026442 005__ 20250822121332.0 001026442 0247_ $$2doi$$a10.5194/gmd-17-4077-2024 001026442 0247_ $$2ISSN$$a1991-959X 001026442 0247_ $$2ISSN$$a1991-9603 001026442 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-03395 001026442 0247_ $$2WOS$$aWOS:001226505800001 001026442 037__ $$aFZJ-2024-03395 001026442 041__ $$aEnglish 001026442 082__ $$a550 001026442 1001_ $$0P:(DE-Juel1)129125$$aHoffmann, Lars$$b0$$eCorresponding author$$ufzj 001026442 245__ $$aAccelerating Lagrangian transport simulations on graphics processing units: performance optimizations of Massive-Parallel Trajectory Calculations (MPTRAC) v2.6 001026442 260__ $$aKatlenburg-Lindau$$bCopernicus$$c2024 001026442 3367_ $$2DRIVER$$aarticle 001026442 3367_ $$2DataCite$$aOutput Types/Journal article 001026442 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1717745098_3799 001026442 3367_ $$2BibTeX$$aARTICLE 001026442 3367_ $$2ORCID$$aJOURNAL_ARTICLE 001026442 3367_ $$00$$2EndNote$$aJournal Article 001026442 520__ $$aLagrangian particle dispersion models are indispensable tools for the study of atmospheric transport processes. However, Lagrangian transport simulations can become numerically expensive when large numbers of air parcels are involved. To accelerate these simulations, we made considerable efforts to port the Massive-Parallel Trajectory Calculations (MPTRAC) model to graphics processing units (GPUs). Here we discuss performance optimizations of the major bottleneck of the GPU code of MPTRAC, the advection kernel. Timeline, roofline, and memory analyses of the baseline GPU code revealed that the application is memory-bound, and performance suffers from near-random memory access patterns. By changing the data structure of the horizontal wind and vertical velocity fields of the global meteorological data driving the simulations from structure of arrays (SoAs) to array of structures (AoSs) and by introducing a sorting method for better memory alignment of the particle data, performance was greatly improved. We evaluated the performance on NVIDIA A100 GPUs of the Jülich Wizard for European Leadership Science (JUWELS) Booster module at the Jülich Supercomputing Center, Germany. For our largest test case, transport simulations with 108 particles driven by the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 reanalysis, we found that the runtime for the full set of physics computations was reduced by 75 %, including a reduction of 85 % for the advection kernel. In addition to demonstrating the benefits of code optimization for GPUs, we show that the runtime of central processing unit (CPU-)only simulations is also improved. For our largest test case, we found a runtime reduction of 34 % for the physics computations, including a reduction of 65 % for the advection kernel. The code optimizations discussed here bring the MPTRAC model closer to applications on upcoming exascale high-performance computing systems and will also be of interest for optimizing the performance of other models using particle methods. 001026442 536__ $$0G:(DE-HGF)POF4-5111$$a5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0 001026442 536__ $$0G:(DE-HGF)POF4-2112$$a2112 - Climate Feedbacks (POF4-211)$$cPOF4-211$$fPOF IV$$x1 001026442 536__ $$0G:(DE-HGF)POF4-5122$$a5122 - Future Computing & Big Data Systems (POF4-512)$$cPOF4-512$$fPOF IV$$x2 001026442 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x3 001026442 536__ $$0G:(DE-Juel-1)ATML-X-DEV$$aATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV)$$cATML-X-DEV$$x4 001026442 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de 001026442 7001_ $$0P:(DE-Juel1)176293$$aHaghighi Mood, Kaveh$$b1 001026442 7001_ $$0P:(DE-Juel1)145478$$aHerten, Andreas$$b2 001026442 7001_ $$0P:(DE-HGF)0$$aHrywniak, Markus$$b3 001026442 7001_ $$0P:(DE-HGF)0$$aKraus, Jiri$$b4 001026442 7001_ $$0P:(DE-Juel1)180256$$aClemens, Jan$$b5 001026442 7001_ $$0P:(DE-Juel1)187051$$aLiu, Mingzhao$$b6 001026442 773__ $$0PERI:(DE-600)2456725-5$$a10.5194/gmd-17-4077-2024$$gVol. 17, no. 9, p. 4077 - 4094$$n9$$p4077 - 4094$$tGeoscientific model development$$v17$$x1991-959X$$y2024 001026442 8564_ $$uhttps://juser.fz-juelich.de/record/1026442/files/Invoice_Helmholtz-PUC-2024-57.pdf 001026442 8564_ $$uhttps://juser.fz-juelich.de/record/1026442/files/Invoice_Helmholtz-PUC-2024-57.gif?subformat=icon$$xicon 001026442 8564_ $$uhttps://juser.fz-juelich.de/record/1026442/files/Invoice_Helmholtz-PUC-2024-57.jpg?subformat=icon-1440$$xicon-1440 001026442 8564_ $$uhttps://juser.fz-juelich.de/record/1026442/files/Invoice_Helmholtz-PUC-2024-57.jpg?subformat=icon-180$$xicon-180 001026442 8564_ $$uhttps://juser.fz-juelich.de/record/1026442/files/Invoice_Helmholtz-PUC-2024-57.jpg?subformat=icon-640$$xicon-640 001026442 8564_ $$uhttps://juser.fz-juelich.de/record/1026442/files/gmd-17-4077-2024.pdf$$yOpenAccess 001026442 8564_ $$uhttps://juser.fz-juelich.de/record/1026442/files/gmd-17-4077-2024.gif?subformat=icon$$xicon$$yOpenAccess 001026442 8564_ $$uhttps://juser.fz-juelich.de/record/1026442/files/gmd-17-4077-2024.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess 001026442 8564_ $$uhttps://juser.fz-juelich.de/record/1026442/files/gmd-17-4077-2024.jpg?subformat=icon-180$$xicon-180$$yOpenAccess 001026442 8564_ $$uhttps://juser.fz-juelich.de/record/1026442/files/gmd-17-4077-2024.jpg?subformat=icon-640$$xicon-640$$yOpenAccess 001026442 8767_ $$8Helmholtz-PUC-2024-57$$92024-05-22$$a1200203854$$d2024-05-29$$eAPC$$jZahlung erfolgt 001026442 909CO $$ooai:juser.fz-juelich.de:1026442$$pdnbdelivery$$popenCost$$pVDB$$pdriver$$pOpenAPC$$popen_access$$popenaire 001026442 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)129125$$aForschungszentrum Jülich$$b0$$kFZJ 001026442 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)176293$$aForschungszentrum Jülich$$b1$$kFZJ 001026442 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)145478$$aForschungszentrum Jülich$$b2$$kFZJ 001026442 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)180256$$aForschungszentrum Jülich$$b5$$kFZJ 001026442 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)187051$$aForschungszentrum Jülich$$b6$$kFZJ 001026442 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5111$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0 001026442 9131_ $$0G:(DE-HGF)POF4-211$$1G:(DE-HGF)POF4-210$$2G:(DE-HGF)POF4-200$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-2112$$aDE-HGF$$bForschungsbereich Erde und Umwelt$$lErde im Wandel – Unsere Zukunft nachhaltig gestalten$$vDie Atmosphäre im globalen Wandel$$x1 001026442 9131_ $$0G:(DE-HGF)POF4-512$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5122$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vSupercomputing & Big Data Infrastructures$$x2 001026442 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x3 001026442 9141_ $$y2024 001026442 915pc $$0PC:(DE-HGF)0000$$2APC$$aAPC keys set 001026442 915pc $$0PC:(DE-HGF)0003$$2APC$$aDOAJ Journal 001026442 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2023-10-25 001026442 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0 001026442 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2022-12-20T09:29:04Z 001026442 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2022-12-20T09:29:04Z 001026442 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2023-10-25 001026442 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2023-10-25 001026442 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess 001026442 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2023-10-25 001026442 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2024-12-21 001026442 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2024-12-21 001026442 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bDOAJ : Open peer review$$d2022-12-20T09:29:04Z 001026442 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search$$d2024-12-21 001026442 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC$$d2024-12-21 001026442 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2024-12-21 001026442 915__ $$0StatID:(DE-HGF)1150$$2StatID$$aDBCoverage$$bCurrent Contents - Physical, Chemical and Earth Sciences$$d2024-12-21 001026442 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2024-12-21 001026442 920__ $$lyes 001026442 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0 001026442 9201_ $$0I:(DE-Juel1)IEK-7-20101013$$kIEK-7$$lStratosphäre$$x1 001026442 9201_ $$0I:(DE-Juel1)CASA-20230315$$kCASA$$lCenter for Advanced Simulation and Analytics$$x2 001026442 9801_ $$aAPC 001026442 9801_ $$aFullTexts 001026442 980__ $$ajournal 001026442 980__ $$aVDB 001026442 980__ $$aUNRESTRICTED 001026442 980__ $$aI:(DE-Juel1)JSC-20090406 001026442 980__ $$aI:(DE-Juel1)IEK-7-20101013 001026442 980__ $$aI:(DE-Juel1)CASA-20230315 001026442 980__ $$aAPC 001026442 981__ $$aI:(DE-Juel1)ICE-4-20101013