Accelerating Lagrangian transport simulations on graphics processing units: performance optimizations of Massive-Parallel Trajectory Calculations (MPTRAC) v2.6

Hoffmann, Lars; Liu, Mingzhao; Kraus, Jiri; Herten, Andreas; Haghighi Mood, Kaveh; Hrywniak, Markus; Clemens, Jan

doi:10.5194/gmd-17-4077-2024

Typ	Amount	VAT	Currency	Share	Status	Cost centre
APC	1600.00	0.00	EUR	100.00 %	(Zahlung erfolgt)	ZB
Sum	1600.00	0.00	EUR
Total	1600.00

Journal Article

FZJ-2024-03395

Accelerating Lagrangian transport simulations on graphics processing units: performance optimizations of Massive-Parallel Trajectory Calculations (MPTRAC) v2.6

Hoffmann, L. (Corresponding author)FZJ* ; Haghighi Mood, K.FZJ* ; Herten, A.FZJ* ; Hrywniak, M. ; Kraus, J. ; Clemens, J.FZJ* ; Liu, M.FZJ*

2024
Copernicus Katlenburg-Lindau

Geoscientific model development 17(9), 4077 - 4094 (2024) [10.5194/gmd-17-4077-2024]

This record in other databases:

Please use a persistent id in citations: doi:10.5194/gmd-17-4077-2024 doi:10.34734/FZJ-2024-03395

Abstract: Lagrangian particle dispersion models are indispensable tools for the study of atmospheric transport processes. However, Lagrangian transport simulations can become numerically expensive when large numbers of air parcels are involved. To accelerate these simulations, we made considerable efforts to port the Massive-Parallel Trajectory Calculations (MPTRAC) model to graphics processing units (GPUs). Here we discuss performance optimizations of the major bottleneck of the GPU code of MPTRAC, the advection kernel. Timeline, roofline, and memory analyses of the baseline GPU code revealed that the application is memory-bound, and performance suffers from near-random memory access patterns. By changing the data structure of the horizontal wind and vertical velocity fields of the global meteorological data driving the simulations from structure of arrays (SoAs) to array of structures (AoSs) and by introducing a sorting method for better memory alignment of the particle data, performance was greatly improved. We evaluated the performance on NVIDIA A100 GPUs of the Jülich Wizard for European Leadership Science (JUWELS) Booster module at the Jülich Supercomputing Center, Germany. For our largest test case, transport simulations with 108 particles driven by the European Centre for Medium-Range Weather Forecasts (ECMWF) ERA5 reanalysis, we found that the runtime for the full set of physics computations was reduced by 75 %, including a reduction of 85 % for the advection kernel. In addition to demonstrating the benefits of code optimization for GPUs, we show that the runtime of central processing unit (CPU-)only simulations is also improved. For our largest test case, we found a runtime reduction of 34 % for the physics computations, including a reduction of 65 % for the advection kernel. The code optimizations discussed here bring the MPTRAC model closer to applications on upcoming exascale high-performance computing systems and will also be of interest for optimizing the performance of other models using particle methods.

Classification:

ddc:550

Contributing Institute(s):

Research Program(s):

Appears in the scientific report 2024

Database coverage:
Medline

;

;

;

; Article Processing Charges ; Clarivate Analytics Master Journal List ; Current Contents - Physical, Chemical and Earth Sciences ; DOAJ Seal ; Ebsco Academic Search ; Essential Science Indicators ; Fees ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection

Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Institute Collections > ICE > ICE-4
Workflow collections > Public records
Workflow collections > Publication Charges
Institute Collections > JSC
IEK > IEK-7
Publications database
Open Access

Record created 2024-05-17, last modified 2026-01-22

Similar records

OpenAccess:

PDF
(additional files)

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help