A New Spin on the Fast Multipole Method for GPUS: Rethinking the Far-Field Operators

Lengvenis, Arijus; Kabadshow, Ivo; Morgenstern, Laura; Dachsel, Holger

Items
Marc 21

001			1049541
005			20251213202221.0
037	_	_	\|a FZJ-2025-05345
100	1	_	\|a Lengvenis, Arijus \|0 P:(DE-Juel1)206763 \|b 0 \|e Corresponding author \|u fzj
111	2	_	\|a 2025 IEEE International Parallel and Distributed Processing Symposium \|g IPDPS \|c Milano \|d 2025-06-03 - 2025-06-07 \|w Italy
245	_	_	\|a A New Spin on the Fast Multipole Method for GPUS: Rethinking the Far-Field Operators
260	_	_	\|c 2025
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a Other \|2 DataCite
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a LECTURE_SPEECH \|2 ORCID
336	7	_	\|a Conference Presentation \|b conf \|m conf \|0 PUB:(DE-HGF)6 \|s 1765628766_19099 \|2 PUB:(DE-HGF) \|x After Call
520	_	_	\|a The Fast Multipole Method (FMM) is an optimally efficient algorithm for solving N -body problems: a fundamental challenge in fields like astrophysics, plasma physics and molecular dynamics. It is particularly suited for computing 1/r potentials present in Coulomb and gravitational particle systems. Despite the near-field phase being trivially parallelisable, the far-field phase of the 1/r FMM currently lacks an efficient, massively parallel GPU algorithm fitting for the era of Exascale computing. Current state-of-the-art approaches either favor highly parallel but inefficient expansion shift operators or asymptotically efficient but poorly parallelisable rotation-based ones. Recently, a breakthrough was made with the re-evaluation of a rotation operator variant called fast rotation, which dramatically increases caching effectiveness and marries the advantages of both methods. Thus, this paper incorporates this approach to create fast rotation-based operators that facilitate an efficient far-field algorithm for the FMM on GPUs. Additionally, a warpcentric data access scheme is co-developed alongside a matching octree design, which yields coalesced memory access patterns for the bottleneck operators of the far-field phase. The fast rotation algorithm is enhanced with a cache-tiling mechanism, maximising GPU cache utilisation. Compared to the state-of-theart GPU FMM far-field implementation, our algorithm achieves lower running times across the board and a 2.47 x speedup for an increased precision simulation, with the performance improvement growing as precision increases, providing concrete proof of efficacy for dense particle systems.
536	_	_	\|a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) \|0 G:(DE-HGF)POF4-5112 \|c POF4-511 \|f POF IV \|x 0
700	1	_	\|a Dachsel, Holger \|0 P:(DE-Juel1)132079 \|b 1 \|u fzj
700	1	_	\|a Morgenstern, Laura \|0 P:(DE-Juel1)169856 \|b 2
700	1	_	\|a Kabadshow, Ivo \|0 P:(DE-Juel1)132152 \|b 3 \|u fzj
909	C	O	\|o oai:juser.fz-juelich.de:1049541 \|p VDB
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)206763
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 1 \|6 P:(DE-Juel1)132079
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 3 \|6 P:(DE-Juel1)132152
913	1	_	\|a DE-HGF \|b Key Technologies \|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action \|1 G:(DE-HGF)POF4-510 \|0 G:(DE-HGF)POF4-511 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-500 \|4 G:(DE-HGF)POF \|v Enabling Computational- & Data-Intensive Science and Engineering \|9 G:(DE-HGF)POF4-5112 \|x 0
914	1	_	\|y 2025
920	_	_	\|l yes
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a conf
980	_	_	\|a VDB
980	_	_	\|a I:(DE-Juel1)JSC-20090406
980	_	_	\|a UNRESTRICTED

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe