001     152041
005     20250314084110.0
020 _ _ |a 978-1-61499-380-3
024 7 _ |a 10.3233/978-1-61499-381-0-773
|2 doi
024 7 _ |a WOS:000452120400078
|2 WOS
037 _ _ |a FZJ-2014-01861
100 1 _ |a Schlütter, Marc
|0 P:(DE-Juel1)142180
|b 0
|u fzj
111 2 _ |a International Conference on Parallel Computing
|g ParCo 2013
|c Munich
|d 2013-09-10 - 2013-09-13
|w Germany
245 _ _ |a Profiling Hybrid HMPP Applications with Score-P on Heterogeneous Hardware
260 _ _ |c 2014
|b IOS Press
295 1 0 |a Parallel Computing: Accelerating Computational Science and Engineering (CSE)
300 _ _ |a 773 - 782
336 7 _ |a Contribution to a conference proceedings
|b contrib
|m contrib
|0 PUB:(DE-HGF)8
|s 1402401709_15788
|2 PUB:(DE-HGF)
336 7 _ |a Contribution to a book
|0 PUB:(DE-HGF)7
|2 PUB:(DE-HGF)
|m contb
336 7 _ |a Conference Paper
|0 33
|2 EndNote
336 7 _ |a CONFERENCE_PAPER
|2 ORCID
336 7 _ |a Output Types/Conference Paper
|2 DataCite
336 7 _ |a conferenceObject
|2 DRIVER
336 7 _ |a INPROCEEDINGS
|2 BibTeX
490 0 _ |a Advances in Parallel Computing
|v 25
520 _ _ |a In heterogeneous environments with multi-core systems and accelerators, programming and optimizing large parallel applications turns into a time-intensive and hardware-dependent challenge. To assist application developers in this process, a number of tools and high-level compilers have been developed. Directive-based programming models such as HMPP and OpenACC provide abstractions over low-level GPU programming models,such as CUDA or OpenCL. The compilers developed by CAPS automatically transform the pragma-annotated application code into low-level code, thereby allowing the parallelization and optimization for a given accelerator hardware. To analyze the performance of parallel applications, multiple partners in Germany and the US jointly develop the community measurement infrastructure Score-P. Score-P gathers performance execution profiles, which can be presented and analyzed within the CUBE result browser, and collects detailed event traces to be processed by post-mortem analysis tools such as Scalasca and Vampir.In this paper we present the integration and combined use of Score-P and the CAPS compilers as one approach to efficiently parallelize and optimize codes. Specifically, we describe the PHMPP profiling interface, it's implementation in Score-P, and the presentation of preliminary results in CUBE.
536 _ _ |a 411 - Computational Science and Mathematical Methods (POF2-411)
|0 G:(DE-HGF)POF2-411
|c POF2-411
|f POF II
|x 0
536 _ _ |0 G:(DE-Juel-1)ATMLPP
|a ATMLPP - ATML Parallel Performance (ATMLPP)
|c ATMLPP
|x 1
700 1 _ |a Philippen, Peter
|0 P:(DE-Juel1)143710
|b 1
|u fzj
700 1 _ |a Morin, Laurent
|0 P:(DE-HGF)0
|b 2
700 1 _ |a Geimer, Markus
|0 P:(DE-Juel1)132112
|b 3
|u fzj
700 1 _ |a Mohr, Bernd
|0 P:(DE-Juel1)132199
|b 4
|u fzj
773 _ _ |a 10.3233/978-1-61499-381-0-773
856 4 _ |u https://juser.fz-juelich.de/record/152041/files/FZJ-2014-01861.pdf
|y Restricted
909 C O |o oai:juser.fz-juelich.de:152041
|p VDB
910 1 _ |a Forschungszentrum Jülich GmbH
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)142180
910 1 _ |a Forschungszentrum Jülich GmbH
|0 I:(DE-588b)5008462-8
|k FZJ
|b 1
|6 P:(DE-Juel1)143710
910 1 _ |a Forschungszentrum Jülich GmbH
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)132112
910 1 _ |a Forschungszentrum Jülich GmbH
|0 I:(DE-588b)5008462-8
|k FZJ
|b 4
|6 P:(DE-Juel1)132199
913 2 _ |a DE-HGF
|b Key Technologies
|l Supercomputing & Big Data
|1 G:(DE-HGF)POF3-510
|0 G:(DE-HGF)POF3-511
|2 G:(DE-HGF)POF3-500
|v Computational Science and Mathematical Methods
|x 0
913 1 _ |a DE-HGF
|b Schlüsseltechnologien
|l Supercomputing
|1 G:(DE-HGF)POF2-410
|0 G:(DE-HGF)POF2-411
|2 G:(DE-HGF)POF2-400
|v Computational Science and Mathematical Methods
|x 0
|4 G:(DE-HGF)POF
|3 G:(DE-HGF)POF2
914 1 _ |y 2014
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a contrib
980 _ _ |a VDB
980 _ _ |a contb
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 _ _ |a UNRESTRICTED


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21