Conference Presentation (After Call) FZJ-2025-05345

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
A New Spin on the Fast Multipole Method for GPUS: Rethinking the Far-Field Operators

 ;  ;  ;

2025

2025 IEEE International Parallel and Distributed Processing Symposium, IPDPS, MilanoMilano, Italy, 3 Jun 2025 - 7 Jun 20252025-06-032025-06-07

Abstract: The Fast Multipole Method (FMM) is an optimally efficient algorithm for solving N -body problems: a fundamental challenge in fields like astrophysics, plasma physics and molecular dynamics. It is particularly suited for computing 1/r potentials present in Coulomb and gravitational particle systems. Despite the near-field phase being trivially parallelisable, the far-field phase of the 1/r FMM currently lacks an efficient, massively parallel GPU algorithm fitting for the era of Exascale computing. Current state-of-the-art approaches either favor highly parallel but inefficient expansion shift operators or asymptotically efficient but poorly parallelisable rotation-based ones. Recently, a breakthrough was made with the re-evaluation of a rotation operator variant called fast rotation, which dramatically increases caching effectiveness and marries the advantages of both methods. Thus, this paper incorporates this approach to create fast rotation-based operators that facilitate an efficient far-field algorithm for the FMM on GPUs. Additionally, a warpcentric data access scheme is co-developed alongside a matching octree design, which yields coalesced memory access patterns for the bottleneck operators of the far-field phase. The fast rotation algorithm is enhanced with a cache-tiling mechanism, maximising GPU cache utilisation. Compared to the state-of-theart GPU FMM far-field implementation, our algorithm achieves lower running times across the board and a 2.47 x speedup for an increased precision simulation, with the performance improvement growing as precision increases, providing concrete proof of efficacy for dense particle systems.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)

Appears in the scientific report 2025
Click to display QR Code for this record

The record appears in these collections:
Document types > Presentations > Conference Presentations
Workflow collections > Public records
Institute Collections > JSC
Publications database

 Record created 2025-12-12, last modified 2025-12-13



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)