000281260 001__ 281260 000281260 005__ 20210129221655.0 000281260 0247_ $$2doi$$a10.1109/WACCPD.2014.11 000281260 0247_ $$2altmetric$$aaltmetric:21828280 000281260 037__ $$aFZJ-2016-00959 000281260 041__ $$aEnglish 000281260 1001_ $$0P:(DE-HGF)0$$aKraus, Jiri$$b0 000281260 1112_ $$a2014 First Workshop on Accelerator Programming using Directives (WACCPD)$$cNew Orleans$$d2014-11-17 - 2014-11-17$$wLA 000281260 245__ $$aAccelerating a C++ CFD Code with OpenACC 000281260 260__ $$bIEEE$$c2014 000281260 29510 $$a2014 First Workshop on Accelerator Programming using Directives : [Proceedings] - ISBN 978-1-4673-6753-0 - 000281260 300__ $$a47-54 000281260 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1454505599_9274 000281260 3367_ $$033$$2EndNote$$aConference Paper 000281260 3367_ $$2ORCID$$aCONFERENCE_PAPER 000281260 3367_ $$2DataCite$$aOutput Types/Conference Paper 000281260 3367_ $$2DRIVER$$aconferenceObject 000281260 3367_ $$2BibTeX$$aINPROCEEDINGS 000281260 500__ $$3POF3_Assignment on 2016-02-29 000281260 520__ $$aTodays HPC systems are increasingly utilizing accelerators to lower time to solution for their users and reduce power consumption. To utilize the higher performance and energy efficiency of these accelerators, application developers need to rewrite at least parts of their codes. Taking the C++ flow solver ZFS as an example, we show that the directive-based programming model allows one to achieve good performance with reasonable effort, even for mature codes with many lines of code. Using OpenACC directives permitted us to incrementally accelerate ZFS, focusing on the parts of the program that are relevant for the problem at hand. The two new OpenACC 2.0 features, unstructured data regions and atomics, are required for this. OpenACC's interoperability with existing GPU libraries via the host_data use_device construct allowed to use CUDAaware MPI to achieve multi-GPU scalability comparable to the CPU version of ZFS. Like many other codes, the data structures of ZFS have been designed with traditional CPUs and their relatively large private caches in mind. This leads to suboptimal memory access patterns on accelerators, such as GPUs. We show how the texture cache on NVIDIA GPUs can be used to minimize the performance impact of these suboptimal patterns without writing platform specific code. For the kernel most affected by the memory access pattern, we compare the initial array of structures memory layout with a structure of arrays layout. 000281260 536__ $$0G:(DE-HGF)POF3-513$$a513 - Supercomputer Facility (POF3-513)$$cPOF3-513$$fPOF III$$x0 000281260 536__ $$0G:(DE-HGF)POF2-41G21$$a41G - Supercomputer Facility (POF2-41G21)$$cPOF2-41G21$$fPOF II$$x1 000281260 588__ $$aDataset connected to CrossRef Conference 000281260 7001_ $$0P:(DE-Juel1)145740$$aSchlottke, Michael$$b1 000281260 7001_ $$0P:(DE-Juel1)157723$$aAdinets, Andrey$$b2 000281260 7001_ $$0P:(DE-Juel1)144441$$aPleiter, Dirk$$b3 000281260 773__ $$a10.1109/WACCPD.2014.11 000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.pdf$$yRestricted 000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.gif?subformat=icon$$xicon$$yRestricted 000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.jpg?subformat=icon-1440$$xicon-1440$$yRestricted 000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.jpg?subformat=icon-180$$xicon-180$$yRestricted 000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.jpg?subformat=icon-640$$xicon-640$$yRestricted 000281260 8564_ $$uhttps://juser.fz-juelich.de/record/281260/files/07081677.pdf?subformat=pdfa$$xpdfa$$yRestricted 000281260 909CO $$ooai:juser.fz-juelich.de:281260$$pVDB 000281260 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)145740$$aForschungszentrum Jülich GmbH$$b1$$kFZJ 000281260 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)144441$$aForschungszentrum Jülich GmbH$$b3$$kFZJ 000281260 9132_ $$0G:(DE-HGF)POF3-519H$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data $$vAddenda$$x0 000281260 9131_ $$0G:(DE-HGF)POF3-513$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vSupercomputer Facility$$x0 000281260 9131_ $$0G:(DE-HGF)POF2-41G21$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$3G:(DE-HGF)POF2$$4G:(DE-HGF)POF$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vSupercomputer Facility$$x1 000281260 9141_ $$y2015 000281260 915__ $$0StatID:(DE-HGF)0550$$2StatID$$aNo Authors Fulltext 000281260 920__ $$lyes 000281260 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0 000281260 980__ $$acontrib 000281260 980__ $$aVDB 000281260 980__ $$aUNRESTRICTED 000281260 980__ $$aI:(DE-Juel1)JSC-20090406