%0 Conference Paper
%A Beckmann, Andreas
%A Kabadshow, Ivo
%T Portable Node-Level Performance Optimization for the Fast Multipole Method
%V 105
%C Cham
%I Springer International Publishing
%M FZJ-2016-00642
%@ 978-3-319-22996-6
%B Lecture Notes in Computational Science and Engineering
%P 29 - 46
%D 2015
%< Recent Trends in Computational Engineering - CE2014
%X This article provides an in-depth analysis and high-level C++ optimization strategies for the most time-consuming kernels of a Fast Multipole Method (FMM). The two main kernels of a Coulomb FMM are formulated to support different hardware features, such as unrolling, vectorization or threading without the need to rewrite the kernels in intrinsics or even assembly. The abstract description of the algorithm automatically allows optimal node-level peak performance on a broad class of available hardware platforms. Most of the presented optimization schemes allow a generic, hence platform-independent description for other kernels as well.
%B 3rd International Workshop on Computational Engineering
%C 6 Oct 2014 - 10 Oct 2014, Stuttgart (Germany)
Y2 6 Oct 2014 - 10 Oct 2014
M2 Stuttgart, Germany
%F PUB:(DE-HGF)8
%9 Contribution to a conference proceedings
%R 10.1007/978-3-319-22997-3_2
%U https://juser.fz-juelich.de/record/280931