Accelerating a C++ CFD Code with OpenACC

Kraus, Jiri; Adinets, Andrey; Pleiter, Dirk; Schlottke, Michael

doi:10.1109/WACCPD.2014.11

Items
Marc 21

001			281260
005			20210129221655.0
024	7	_	\|a 10.1109/WACCPD.2014.11 \|2 doi
024	7	_	\|a altmetric:21828280 \|2 altmetric
037	_	_	\|a FZJ-2016-00959
041	_	_	\|a English
100	1	_	\|a Kraus, Jiri \|0 P:(DE-HGF)0 \|b 0
111	2	_	\|a 2014 First Workshop on Accelerator Programming using Directives (WACCPD) \|c New Orleans \|d 2014-11-17 - 2014-11-17 \|w LA
245	_	_	\|a Accelerating a C++ CFD Code with OpenACC
260	_	_	\|c 2014 \|b IEEE
295	1	0	\|a 2014 First Workshop on Accelerator Programming using Directives : [Proceedings] - ISBN 978-1-4673-6753-0 -
300	_	_	\|a 47-54
336	7	_	\|a Contribution to a conference proceedings \|b contrib \|m contrib \|0 PUB:(DE-HGF)8 \|s 1454505599_9274 \|2 PUB:(DE-HGF)
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a CONFERENCE_PAPER \|2 ORCID
336	7	_	\|a Output Types/Conference Paper \|2 DataCite
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
500	_	_	\|3 POF3_Assignment on 2016-02-29
520	_	_	\|a Todays HPC systems are increasingly utilizing accelerators to lower time to solution for their users and reduce power consumption. To utilize the higher performance and energy efficiency of these accelerators, application developers need to rewrite at least parts of their codes. Taking the C++ flow solver ZFS as an example, we show that the directive-based programming model allows one to achieve good performance with reasonable effort, even for mature codes with many lines of code. Using OpenACC directives permitted us to incrementally accelerate ZFS, focusing on the parts of the program that are relevant for the problem at hand. The two new OpenACC 2.0 features, unstructured data regions and atomics, are required for this. OpenACC's interoperability with existing GPU libraries via the host_data use_device construct allowed to use CUDAaware MPI to achieve multi-GPU scalability comparable to the CPU version of ZFS. Like many other codes, the data structures of ZFS have been designed with traditional CPUs and their relatively large private caches in mind. This leads to suboptimal memory access patterns on accelerators, such as GPUs. We show how the texture cache on NVIDIA GPUs can be used to minimize the performance impact of these suboptimal patterns without writing platform specific code. For the kernel most affected by the memory access pattern, we compare the initial array of structures memory layout with a structure of arrays layout.
536	_	_	\|a 513 - Supercomputer Facility (POF3-513) \|0 G:(DE-HGF)POF3-513 \|c POF3-513 \|f POF III \|x 0
536	_	_	\|a 41G - Supercomputer Facility (POF2-41G21) \|0 G:(DE-HGF)POF2-41G21 \|c POF2-41G21 \|f POF II \|x 1
588	_	_	\|a Dataset connected to CrossRef Conference
700	1	_	\|a Schlottke, Michael \|0 P:(DE-Juel1)145740 \|b 1
700	1	_	\|a Adinets, Andrey \|0 P:(DE-Juel1)157723 \|b 2
700	1	_	\|a Pleiter, Dirk \|0 P:(DE-Juel1)144441 \|b 3
773	_	_	\|a 10.1109/WACCPD.2014.11
856	4	_	\|u https://juser.fz-juelich.de/record/281260/files/07081677.pdf \|y Restricted
856	4	_	\|x icon \|u https://juser.fz-juelich.de/record/281260/files/07081677.gif?subformat=icon \|y Restricted
856	4	_	\|x icon-1440 \|u https://juser.fz-juelich.de/record/281260/files/07081677.jpg?subformat=icon-1440 \|y Restricted
856	4	_	\|x icon-180 \|u https://juser.fz-juelich.de/record/281260/files/07081677.jpg?subformat=icon-180 \|y Restricted
856	4	_	\|x icon-640 \|u https://juser.fz-juelich.de/record/281260/files/07081677.jpg?subformat=icon-640 \|y Restricted
856	4	_	\|x pdfa \|u https://juser.fz-juelich.de/record/281260/files/07081677.pdf?subformat=pdfa \|y Restricted
909	C	O	\|o oai:juser.fz-juelich.de:281260 \|p VDB
910	1	_	\|a Forschungszentrum Jülich GmbH \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 1 \|6 P:(DE-Juel1)145740
910	1	_	\|a Forschungszentrum Jülich GmbH \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 3 \|6 P:(DE-Juel1)144441
913	2	_	\|a DE-HGF \|b Key Technologies \|l Supercomputing & Big Data \|1 G:(DE-HGF)POF3-510 \|0 G:(DE-HGF)POF3-519H \|2 G:(DE-HGF)POF3-500 \|v Addenda \|x 0
913	1	_	\|a DE-HGF \|b Key Technologies \|1 G:(DE-HGF)POF3-510 \|0 G:(DE-HGF)POF3-513 \|2 G:(DE-HGF)POF3-500 \|v Supercomputer Facility \|x 0 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF3 \|l Supercomputing & Big Data
913	1	_	\|a DE-HGF \|b Schlüsseltechnologien \|l Supercomputing \|1 G:(DE-HGF)POF2-410 \|0 G:(DE-HGF)POF2-41G21 \|2 G:(DE-HGF)POF2-400 \|v Supercomputer Facility \|x 1 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF2
914	1	_	\|y 2015
915	_	_	\|a No Authors Fulltext \|0 StatID:(DE-HGF)0550 \|2 StatID
920	_	_	\|l yes
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a contrib
980	_	_	\|a VDB
980	_	_	\|a UNRESTRICTED
980	_	_	\|a I:(DE-Juel1)JSC-20090406

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help