000281260 001__ 281260
000281260 005__ 20210129221655.0
000281260 0247_ $$2doi$$a10.1109/WACCPD.2014.11
000281260 0247_ $$2altmetric$$aaltmetric:21828280
000281260 037__ $$aFZJ-2016-00959
000281260 041__ $$aEnglish
000281260 1001_ $$0P:(DE-HGF)0$$aKraus, Jiri$$b0
000281260 1112_ $$a2014 First Workshop on Accelerator Programming using Directives (WACCPD)$$cNew Orleans$$d2014-11-17 - 2014-11-17$$wLA
000281260 245__ $$aAccelerating a C++ CFD Code with OpenACC
000281260 260__ $$bIEEE$$c2014
000281260 29510 $$a2014 First Workshop on Accelerator Programming using Directives : [Proceedings] -  ISBN 978-1-4673-6753-0 -
000281260 300__ $$a47-54
000281260 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1454505599_9274
000281260 3367_ $$033$$2EndNote$$aConference Paper
000281260 3367_ $$2ORCID$$aCONFERENCE_PAPER
000281260 3367_ $$2DataCite$$aOutput Types/Conference Paper
000281260 3367_ $$2DRIVER$$aconferenceObject
000281260 3367_ $$2BibTeX$$aINPROCEEDINGS
000281260 500__ $$3POF3_Assignment on 2016-02-29
000281260 520__ $$aTodays HPC systems are increasingly utilizing accelerators to lower time to solution for their users and reduce power consumption. To utilize the higher performance and energy efficiency of these accelerators, application developers need to rewrite at least parts of their codes. Taking the C++ flow solver ZFS as an example, we show that the directive-based programming model allows one to achieve good performance with reasonable effort, even for mature codes with many lines of code. Using OpenACC directives permitted us to incrementally accelerate ZFS, focusing on the parts of the program that are relevant for the problem at hand. The two new OpenACC 2.0 features, unstructured data regions and atomics, are required for this. OpenACC's interoperability with existing GPU libraries via the host_data use_device construct allowed to use CUDAaware MPI to achieve multi-GPU scalability comparable to the CPU version of ZFS. Like many other codes, the data structures of ZFS have been designed with traditional CPUs and their relatively large private caches in mind. This leads to suboptimal memory access patterns on accelerators, such as GPUs. We show how the texture cache on NVIDIA GPUs can be used to minimize the performance impact of these suboptimal patterns without writing platform specific code. For the kernel most affected by the memory access pattern, we compare the initial array of structures memory layout with a structure of arrays layout.
000281260 536__ $$0G:(DE-HGF)POF3-513$$a513 - Supercomputer Facility (POF3-513)$$cPOF3-513$$fPOF III$$x0
000281260 536__ $$0G:(DE-HGF)POF2-41G21$$a41G - Supercomputer Facility (POF2-41G21)$$cPOF2-41G21$$fPOF II$$x1
000281260 588__ $$aDataset connected to CrossRef Conference
000281260 7001_ $$0P:(DE-Juel1)145740$$aSchlottke, Michael$$b1
000281260 7001_ $$0P:(DE-Juel1)157723$$aAdinets, Andrey$$b2
000281260 7001_ $$0P:(DE-Juel1)144441$$aPleiter, Dirk$$b3
000281260 773__ $$a10.1109/WACCPD.2014.11
000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.pdf$$yRestricted
000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.gif?subformat=icon$$xicon$$yRestricted
000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.jpg?subformat=icon-1440$$xicon-1440$$yRestricted
000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.jpg?subformat=icon-180$$xicon-180$$yRestricted
000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.jpg?subformat=icon-640$$xicon-640$$yRestricted
000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.pdf?subformat=pdfa$$xpdfa$$yRestricted
000281260 909CO $$ooai:juser.fz-juelich.de:281260$$pVDB
000281260 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)145740$$aForschungszentrum Jülich GmbH$$b1$$kFZJ
000281260 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)144441$$aForschungszentrum Jülich GmbH$$b3$$kFZJ
000281260 9132_ $$0G:(DE-HGF)POF3-519H$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data $$vAddenda$$x0
000281260 9131_ $$0G:(DE-HGF)POF3-513$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vSupercomputer Facility$$x0
000281260 9131_ $$0G:(DE-HGF)POF2-41G21$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$3G:(DE-HGF)POF2$$4G:(DE-HGF)POF$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vSupercomputer Facility$$x1
000281260 9141_ $$y2015
000281260 915__ $$0StatID:(DE-HGF)0550$$2StatID$$aNo Authors Fulltext
000281260 920__ $$lyes
000281260 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000281260 980__ $$acontrib
000281260 980__ $$aVDB
000281260 980__ $$aUNRESTRICTED
000281260 980__ $$aI:(DE-Juel1)JSC-20090406