Journal Article FZJ-2021-00076

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
A CUDA fast multipole method with highly efficient M2L far field evaluation

 ;  ;  ;  ;  ;  ;

2021
Sage Science Press Thousand Oaks, Calif.

The international journal of high performance computing applications 35(1), 97 - 117 () [10.1177/1094342020964857]

This record in other databases:    

Please use a persistent id in citations:   doi:

Abstract: Solving an N-body problem, electrostatic or gravitational, is a crucial task and the main computational bottleneck in many scientific applications. Its direct solution is an ubiquitous showcase example for the compute power of graphics processing units (GPUs). However, the naïve pairwise summation has 𝒪(𝑁2) computational complexity. The fast multipole method (FMM) can reduce runtime and complexity to 𝒪(𝑁) for any specified precision. Here, we present a CUDA-accelerated, C++ FMM implementation for multi particle systems with 𝑟−1 potential that are found, e.g. in biomolecular simulations. The algorithm involves several operators to exchange information in an octree data structure. We focus on the Multipole-to-Local (M2L) operator, as its runtime is limiting for the overall performance. We propose, implement and benchmark three different M2L parallelization approaches. Approach (1) utilizes Unified Memory to minimize programming and porting efforts. It achieves decent speedups for only little implementation work. Approach (2) employs CUDA Dynamic Parallelism to significantly improve performance for high approximation accuracies. The presorted list-based approach (3) fits periodic boundary conditions particularly well. It exploits FMM operator symmetries to minimize both memory access and the number of complex multiplications. The result is a compute-bound implementation, i.e. performance is limited by arithmetic operations rather than by memory accesses. The complete CUDA parallelized FMM is incorporated within the GROMACS molecular dynamics package as an alternative Coulomb solver.

Classification:

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)

Appears in the scientific report 2021
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; OpenAccess ; Allianz-Lizenz ; Clarivate Analytics Master Journal List ; Current Contents - Engineering, Computing and Technology ; Ebsco Academic Search ; Essential Science Indicators ; IF < 5 ; JCR ; NationallizenzNationallizenz ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access

 Record created 2021-01-06, last modified 2022-02-28


OpenAccess:
Download fulltext PDF
External link:
Download fulltextFulltext by OpenAccess repository
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)