TY  - CONF
AU  - Beckmann, Andreas
AU  - Kabadshow, Ivo
TI  - Portable Node-Level Performance Optimization for the Fast Multipole Method
VL  - 105
CY  - Cham
PB  - Springer International Publishing
M1  - FZJ-2016-00642
SN  - 978-3-319-22996-6
T2  - Lecture Notes in Computational Science and Engineering
SP  - 29 - 46
PY  - 2015
AB  - This article provides an in-depth analysis and high-level C++ optimization strategies for the most time-consuming kernels of a Fast Multipole Method (FMM). The two main kernels of a Coulomb FMM are formulated to support different hardware features, such as unrolling, vectorization or threading without the need to rewrite the kernels in intrinsics or even assembly. The abstract description of the algorithm automatically allows optimal node-level peak performance on a broad class of available hardware platforms. Most of the presented optimization schemes allow a generic, hence platform-independent description for other kernels as well.
T2  - 3rd International Workshop on Computational Engineering
CY  - 6 Oct 2014 - 10 Oct 2014, Stuttgart (Germany)
Y2  - 6 Oct 2014 - 10 Oct 2014
M2  - Stuttgart, Germany
LB  - PUB:(DE-HGF)8
DO  - DOI:10.1007/978-3-319-22997-3_2
UR  - https://juser.fz-juelich.de/record/280931
ER  -