000875265 001__ 875265
000875265 005__ 20250314084118.0
000875265 0247_ $$2doi$$a10.14529/jsfi200105
000875265 0247_ $$2ISSN$$a2313-8734
000875265 0247_ $$2ISSN$$a2409-6008
000875265 0247_ $$2Handle$$a2128/24821
000875265 037__ $$aFZJ-2020-01909
000875265 082__ $$a004
000875265 1001_ $$0P:(DE-Juel1)132163$$aKnobloch, Michael$$b0$$eCorresponding author$$ufzj
000875265 245__ $$aTools for GPU Computing - Debugging and Performance Analysis of Heterogenous HPC Applications
000875265 260__ $$aChelyabinsk$$bSouth Ural State University$$c2020
000875265 3367_ $$2DRIVER$$aarticle
000875265 3367_ $$2DataCite$$aOutput Types/Journal article
000875265 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1588869157_17986
000875265 3367_ $$2BibTeX$$aARTICLE
000875265 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000875265 3367_ $$00$$2EndNote$$aJournal Article
000875265 520__ $$aGeneral purpose GPUs are now ubiquitous in high-end supercomputing. All but one (the Japanese Fugaku system, which is based on ARM processors) of the announced (pre-)exascale systems contain vast amounts of GPUs that deliver the majority of the performance of these systems. Thus, GPU programming will be a necessity for application developers using high-end HPC systems.However, programming GPUs efficiently is an even more daunting task than traditional HPC application development. This becomes even more apparent for large-scale systems containing thousands of GPUs. Orchestrating all the resources of such a system imposes a tremendous challenge to developers. Luckily a rich ecosystem of tools exist to assist developers in every development step of a GPU application at all scales.In this paper we present an overview of these tools and discuss their capabilities. We start with an overview of different GPU programming models, from low-level with CUDA over pragma-based models like OpenACC to high-level approaches like Kokkos. We discuss their respective tool interfaces as the main method for tools to obtain information on the execution of a kernel on the GPU. The main focus of this paper is on two classes of tools, debuggers and performance analysis tools. Debuggers help the developer to identify problems both on the CPU and GPU side as well as in the interplay of both. Once the application runs correctly, performance analysis tools can be used to pinpoint bottlenecks in the execution of the code and help to increase the overall performance.
000875265 536__ $$0G:(DE-HGF)POF3-511$$a511 - Computational Science and Mathematical Methods (POF3-511)$$cPOF3-511$$fPOF III$$x0
000875265 536__ $$0G:(EU-Grant)824080$$aPOP2 - Performance Optimisation and Productivity 2 (824080)$$c824080$$fH2020-INFRAEDI-2018-1$$x1
000875265 536__ $$0G:(DE-Juel-1)ATMLPP$$aATMLPP - ATML Parallel Performance (ATMLPP)$$cATMLPP$$x2
000875265 588__ $$aDataset connected to CrossRef
000875265 7001_ $$0P:(DE-Juel1)132199$$aMohr, Bernd$$b1$$ufzj
000875265 773__ $$0PERI:(DE-600)2809718-X$$a10.14529/jsfi200105$$gVol. 7, no. 1$$n1$$p91-111$$tSupercomputing frontiers and innovations$$v7$$x2313-8734$$y2020
000875265 8564_ $$uhttps://juser.fz-juelich.de/record/875265/files/paper.pdf$$yOpenAccess
000875265 8564_ $$uhttps://juser.fz-juelich.de/record/875265/files/paper.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000875265 909CO $$ooai:juser.fz-juelich.de:875265$$pdnbdelivery$$pec_fundedresources$$pVDB$$pdriver$$popen_access$$popenaire
000875265 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132163$$aForschungszentrum Jülich$$b0$$kFZJ
000875265 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132199$$aForschungszentrum Jülich$$b1$$kFZJ
000875265 9131_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0
000875265 9141_ $$y2020
000875265 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS
000875265 915__ $$0LIC:(DE-HGF)CCBYNV$$2V:(DE-HGF)$$aCreative Commons Attribution CC BY (No Version)$$bDOAJ
000875265 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal
000875265 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ
000875265 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000875265 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bDOAJ : Peer review
000875265 920__ $$lno
000875265 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000875265 980__ $$ajournal
000875265 980__ $$aVDB
000875265 980__ $$aUNRESTRICTED
000875265 980__ $$aI:(DE-Juel1)JSC-20090406
000875265 9801_ $$aFullTexts