%0 Journal Article
%A Wylie, Brian J. N.
%A Feld, Christian
%A Geimer, Markus
%A Llort, Germán
%A Mendez, Sandra
%A Mercadal, Estanislao
%A Visser, Anke
%A García-Gasulla, Marta
%T 15+ years of joint parallel application performance analysis/tools training with Scalasca/Score-P and Paraver/Extrae toolsets
%J Future generation computer systems
%V 162
%@ 0167-739X
%C Amsterdam [u.a.]
%I Elsevier Science
%M FZJ-2024-05442
%P 107472
%D 2025
%Z Keywords: Hybrid parallel programming; MPI message-passing; OpenMP multithreading; OpenACC device offload acceleration; HPC application execution performance measurement & analysis; Performance assessment & optimisation methodology & tools; Hands-on training & coaching
%X The diverse landscape of distributed heterogeneous computer systems currently available and being created to address computational challenges with the highest performance requirements presents daunting complexity for application developers. They must effectively decompose and distribute their application functionality and data, efficiently orchestrating the associated communication and synchronisation, on multi/manycore CPU processors with multiple attached acceleration devices structured within compute nodes with interconnection networks of various topologies.Sophisticated compilers, runtime systems and libraries are (loosely) matched with debugging, performance measurement and analysis tools, with proprietary versions by integrators/vendors provided exclusively for their systems complemented by portable (primarily) open-source equivalents developed and supported by the international research community over many years. The Scalasca and Paraver toolsets are two widely employed examples of the latter, installed on personal notebook computers through to the largest leadership HPC systems. Over more than fifteen years their developers have worked closely together in numerous collaborative projects culminating in the creation of a universal parallel performance assessment and optimisation methodology focused on application execution efficiency and scalability, and the associated training and coaching of application developers (often in teams) in its productive use, reviewed in this article with lessons learnt therefrom.
%F PUB:(DE-HGF)16
%9 Journal Article
%U <Go to ISI:>//WOS:001294686400001
%R 10.1016/j.future.2024.07.050
%U https://juser.fz-juelich.de/record/1030735