001     885773
005     20250314084119.0
010 _ _ |a
020 _ _ |a 978-0-7381-1070-7/20
020 _ _ |a
020 _ _ |a 978331998696
024 7 _ |a GVK:1033558869
|2 GVK
024 7 _ |a 10.1109/HUSTProtools51951.2020.00014
|2 doi
024 7 _ |a 2128/27434
|2 Handle
024 7 _ |a WOS:000679395600007
|2 WOS
037 _ _ |a FZJ-2020-04080
041 _ _ |a English
100 1 _ |a Wylie, Brian J. N.
|0 P:(DE-Juel1)132302
|b 0
|e Corresponding author
|u fzj
111 2 _ |a Workshop on Programming and Performance Visualization Tools
|g ProTools '20
|c online
|d 2020-11-12 - 2020-11-12
|w online
245 _ _ |a Exascale potholes for HPC: Execution performance and variability analysis of the flagship application code HemeLB
260 _ _ |c 2020
|b IEEE
295 1 0 |a Proceedings of 2020 IEEE/ACM International Workshop on HPC User Support Tools (HUST) and the Workshop on Programming and Performance Visualization Tools (ProTools)
300 _ _ |a 59-70
336 7 _ |a CONFERENCE_PAPER
|2 ORCID
336 7 _ |a Conference Paper
|0 33
|2 EndNote
336 7 _ |a INPROCEEDINGS
|2 BibTeX
336 7 _ |a conferenceObject
|2 DRIVER
336 7 _ |a Output Types/Conference Paper
|2 DataCite
336 7 _ |a Contribution to a conference proceedings
|b contrib
|m contrib
|0 PUB:(DE-HGF)8
|s 1615975892_26978
|2 PUB:(DE-HGF)
336 7 _ |a Contribution to a book
|0 PUB:(DE-HGF)7
|2 PUB:(DE-HGF)
|m contb
520 _ _ |a Performance measurement and analysis of parallel applications is often challenging, despite many excellent commercial and open-source tools being available. Currently envisaged exascale computer systems exacerbate matters by requiring extremely high scalability to effectively exploit millions of processor cores. Unfortunately, significant application execution performance variability arising from increasingly complex interactions between hardware and system software makes this situation much more difficult for application developers and performance analysts alike. This work considers the performance assessment of the HemeLB exascale flagship application code from the EU HPC Centre of Excellence (CoE) for Computational Biomedicine (CompBioMed) running on the SuperMUC-NG Tier-0 leadership system, using the methodology of the Performance Optimisation and Productivity (POP) CoE. Although 80% scaling efficiency is maintained to over 100,000 MPI processes, disappointing initial performance with more processes and corresponding poor strong scaling was identified to originate from the same few compute nodes in multiple runs, which later system diagnostic checks found had faulty DIMMs and lacklustre performance. Excluding these compute nodes from subsequent runs improved performance of executions with over 300,000 MPI processes by a factor of five, resulting in 190x speed-up compared to 864 MPI processes. While communication efficiency remains very good up to the largest scale, parallel efficiency is primarily limited by load balance found to be largely due to core-to-core and run-to-run variability from excessive stalls for memory accesses, that affect many HPC systems with Intel Xeon Scalable processors. The POP methodology for this performance diagnosis is demonstrated via a detailed exposition with widely deployed 'standard' measurement and analysis tools.
536 _ _ |a 511 - Computational Science and Mathematical Methods (POF3-511)
|0 G:(DE-HGF)POF3-511
|c POF3-511
|f POF III
|x 0
536 _ _ |a POP2 - Performance Optimisation and Productivity 2 (824080)
|0 G:(EU-Grant)824080
|c 824080
|f H2020-INFRAEDI-2018-1
|x 1
536 _ _ |a CompBioMed - A Centre of Excellence in Computational Biomedicine (675451)
|0 G:(EU-Grant)675451
|c 675451
|f H2020-EINFRA-2015-1
|x 2
536 _ _ |0 G:(DE-Juel-1)ATMLPP
|a ATMLPP - ATML Parallel Performance (ATMLPP)
|c ATMLPP
|x 3
588 _ _ |a Dataset connected to GVK
650 _ 7 |a E-Government
|0 (DE-588)4728387-7
|2 gnd
773 _ _ |a 10.1109/HUSTProtools51951.2020.00014
856 4 _ |y OpenAccess
|u https://juser.fz-juelich.de/record/885773/files/ProTools20_A4.pdf
856 4 _ |u https://juser.fz-juelich.de/record/885773/files/Publisher%27s%20version.pdf
909 C O |o oai:juser.fz-juelich.de:885773
|p openaire
|p open_access
|p driver
|p VDB
|p ec_fundedresources
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)132302
913 1 _ |a DE-HGF
|b Key Technologies
|l Supercomputing & Big Data
|1 G:(DE-HGF)POF3-510
|0 G:(DE-HGF)POF3-511
|3 G:(DE-HGF)POF3
|2 G:(DE-HGF)POF3-500
|4 G:(DE-HGF)POF
|v Computational Science and Mathematical Methods
|x 0
913 2 _ |a DE-HGF
|b Programmungebundene Forschung
|l ohne Programm
|1 G:(DE-HGF)POF4-890
|0 G:(DE-HGF)POF4-899
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-800
|4 G:(DE-HGF)POF
|v ohne Topic
|x 0
914 1 _ |y 2020
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a contrib
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a contb
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21