001     875265
005     20250314084118.0
024 7 _ |a 10.14529/jsfi200105
|2 doi
024 7 _ |a 2313-8734
|2 ISSN
024 7 _ |a 2409-6008
|2 ISSN
024 7 _ |a 2128/24821
|2 Handle
037 _ _ |a FZJ-2020-01909
082 _ _ |a 004
100 1 _ |a Knobloch, Michael
|0 P:(DE-Juel1)132163
|b 0
|e Corresponding author
|u fzj
245 _ _ |a Tools for GPU Computing - Debugging and Performance Analysis of Heterogenous HPC Applications
260 _ _ |a Chelyabinsk
|c 2020
|b South Ural State University
336 7 _ |a article
|2 DRIVER
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1588869157_17986
|2 PUB:(DE-HGF)
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a Journal Article
|0 0
|2 EndNote
520 _ _ |a General purpose GPUs are now ubiquitous in high-end supercomputing. All but one (the Japanese Fugaku system, which is based on ARM processors) of the announced (pre-)exascale systems contain vast amounts of GPUs that deliver the majority of the performance of these systems. Thus, GPU programming will be a necessity for application developers using high-end HPC systems.However, programming GPUs efficiently is an even more daunting task than traditional HPC application development. This becomes even more apparent for large-scale systems containing thousands of GPUs. Orchestrating all the resources of such a system imposes a tremendous challenge to developers. Luckily a rich ecosystem of tools exist to assist developers in every development step of a GPU application at all scales.In this paper we present an overview of these tools and discuss their capabilities. We start with an overview of different GPU programming models, from low-level with CUDA over pragma-based models like OpenACC to high-level approaches like Kokkos. We discuss their respective tool interfaces as the main method for tools to obtain information on the execution of a kernel on the GPU. The main focus of this paper is on two classes of tools, debuggers and performance analysis tools. Debuggers help the developer to identify problems both on the CPU and GPU side as well as in the interplay of both. Once the application runs correctly, performance analysis tools can be used to pinpoint bottlenecks in the execution of the code and help to increase the overall performance.
536 _ _ |a 511 - Computational Science and Mathematical Methods (POF3-511)
|0 G:(DE-HGF)POF3-511
|c POF3-511
|f POF III
|x 0
536 _ _ |a POP2 - Performance Optimisation and Productivity 2 (824080)
|0 G:(EU-Grant)824080
|c 824080
|f H2020-INFRAEDI-2018-1
|x 1
536 _ _ |0 G:(DE-Juel-1)ATMLPP
|a ATMLPP - ATML Parallel Performance (ATMLPP)
|c ATMLPP
|x 2
588 _ _ |a Dataset connected to CrossRef
700 1 _ |a Mohr, Bernd
|0 P:(DE-Juel1)132199
|b 1
|u fzj
773 _ _ |a 10.14529/jsfi200105
|g Vol. 7, no. 1
|0 PERI:(DE-600)2809718-X
|n 1
|p 91-111
|t Supercomputing frontiers and innovations
|v 7
|y 2020
|x 2313-8734
856 4 _ |y OpenAccess
|u https://juser.fz-juelich.de/record/875265/files/paper.pdf
856 4 _ |y OpenAccess
|x pdfa
|u https://juser.fz-juelich.de/record/875265/files/paper.pdf?subformat=pdfa
909 C O |o oai:juser.fz-juelich.de:875265
|p openaire
|p open_access
|p driver
|p VDB
|p ec_fundedresources
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)132163
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 1
|6 P:(DE-Juel1)132199
913 1 _ |a DE-HGF
|b Key Technologies
|1 G:(DE-HGF)POF3-510
|0 G:(DE-HGF)POF3-511
|2 G:(DE-HGF)POF3-500
|v Computational Science and Mathematical Methods
|x 0
|4 G:(DE-HGF)POF
|3 G:(DE-HGF)POF3
|l Supercomputing & Big Data
914 1 _ |y 2020
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
915 _ _ |a Creative Commons Attribution CC BY (No Version)
|0 LIC:(DE-HGF)CCBYNV
|2 V:(DE-HGF)
|b DOAJ
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0501
|2 StatID
|b DOAJ Seal
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0500
|2 StatID
|b DOAJ
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
915 _ _ |a Peer Review
|0 StatID:(DE-HGF)0030
|2 StatID
|b DOAJ : Peer review
920 _ _ |l no
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21