Portable Node-Level Performance Optimization for the Fast Multipole Method

Beckmann, Andreas; Kabadshow, Ivo

doi:10.1007/978-3-319-22997-3_2

Items
Marc 21

001			280931
005			20210129221443.0
020	_	_	\|a 978-3-319-22996-6
020	_	_	\|a 978-3-319-22997-3 (electronic)
024	7	_	\|a 10.1007/978-3-319-22997-3_2 \|2 doi
037	_	_	\|a FZJ-2016-00642
041	_	_	\|a English
082	_	_	\|a 510
100	1	_	\|a Beckmann, Andreas \|0 P:(DE-Juel1)157750 \|b 0 \|e Corresponding author
111	2	_	\|a 3rd International Workshop on Computational Engineering \|g CE 2014 \|c Stuttgart \|d 2014-10-06 - 2014-10-10 \|w Germany
245	_	_	\|a Portable Node-Level Performance Optimization for the Fast Multipole Method
260	_	_	\|a Cham \|c 2015 \|b Springer International Publishing
295	1	0	\|a Recent Trends in Computational Engineering - CE2014
300	_	_	\|a 29 - 46
336	7	_	\|a Contribution to a conference proceedings \|b contrib \|m contrib \|0 PUB:(DE-HGF)8 \|s 1453379840_2884 \|2 PUB:(DE-HGF)
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a CONFERENCE_PAPER \|2 ORCID
336	7	_	\|a Output Types/Conference Paper \|2 DataCite
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
490	0	_	\|a Lecture Notes in Computational Science and Engineering \|v 105
520	_	_	\|a This article provides an in-depth analysis and high-level C++ optimization strategies for the most time-consuming kernels of a Fast Multipole Method (FMM). The two main kernels of a Coulomb FMM are formulated to support different hardware features, such as unrolling, vectorization or threading without the need to rewrite the kernels in intrinsics or even assembly. The abstract description of the algorithm automatically allows optimal node-level peak performance on a broad class of available hardware platforms. Most of the presented optimization schemes allow a generic, hence platform-independent description for other kernels as well.
536	_	_	\|a 511 - Computational Science and Mathematical Methods (POF3-511) \|0 G:(DE-HGF)POF3-511 \|c POF3-511 \|f POF III \|x 0
536	_	_	\|0 G:(GEPRIS)230673686 \|x 1 \|c 230673686 \|a GromEx - Highly Scalable Unified Long-Range Electrostatics and Flexible Ionization for Realistic Biomolecular Simulations on the Exascale (230673686)
588	_	_	\|a Dataset connected to CrossRef Book Series
700	1	_	\|a Kabadshow, Ivo \|0 P:(DE-Juel1)132152 \|b 1
773	_	_	\|a 10.1007/978-3-319-22997-3_2
909	C	O	\|o oai:juser.fz-juelich.de:280931 \|p VDB
910	1	_	\|a Forschungszentrum Jülich GmbH \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)157750
910	1	_	\|a Forschungszentrum Jülich GmbH \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 1 \|6 P:(DE-Juel1)132152
913	1	_	\|a DE-HGF \|b Key Technologies \|1 G:(DE-HGF)POF3-510 \|0 G:(DE-HGF)POF3-511 \|2 G:(DE-HGF)POF3-500 \|v Computational Science and Mathematical Methods \|x 0 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF3 \|l Supercomputing & Big Data
914	1	_	\|y 2015
915	_	_	\|a No Authors Fulltext \|0 StatID:(DE-HGF)0550 \|2 StatID
920	_	_	\|l yes
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a contrib
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a I:(DE-Juel1)JSC-20090406

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help