000885773 001__ 885773
000885773 005__ 20250314084119.0
000885773 010__ $$a
000885773 020__ $$a978-0-7381-1070-7/20
000885773 020__ $$a
000885773 020__ $$a978331998696
000885773 0247_ $$2GVK$$aGVK:1033558869
000885773 0247_ $$2doi$$a10.1109/HUSTProtools51951.2020.00014
000885773 0247_ $$2Handle$$a2128/27434
000885773 0247_ $$2WOS$$aWOS:000679395600007
000885773 037__ $$aFZJ-2020-04080
000885773 041__ $$aEnglish
000885773 1001_ $$0P:(DE-Juel1)132302$$aWylie, Brian J. N.$$b0$$eCorresponding author$$ufzj
000885773 1112_ $$aWorkshop on Programming and Performance Visualization Tools$$conline$$d2020-11-12 - 2020-11-12$$gProTools '20$$wonline
000885773 245__ $$aExascale potholes for HPC: Execution performance and variability analysis of the flagship application code HemeLB
000885773 260__ $$bIEEE$$c2020
000885773 29510 $$aProceedings of 2020 IEEE/ACM International Workshop on HPC User Support Tools (HUST) and the Workshop on Programming and Performance Visualization Tools (ProTools)
000885773 300__ $$a59-70
000885773 3367_ $$2ORCID$$aCONFERENCE_PAPER
000885773 3367_ $$033$$2EndNote$$aConference Paper
000885773 3367_ $$2BibTeX$$aINPROCEEDINGS
000885773 3367_ $$2DRIVER$$aconferenceObject
000885773 3367_ $$2DataCite$$aOutput Types/Conference Paper
000885773 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1615975892_26978
000885773 3367_ $$0PUB:(DE-HGF)7$$2PUB:(DE-HGF)$$aContribution to a book$$mcontb
000885773 520__ $$aPerformance measurement and analysis of parallel applications is often challenging, despite many excellent commercial and open-source tools being available. Currently envisaged exascale computer systems exacerbate matters by requiring extremely high scalability to effectively exploit millions of processor cores. Unfortunately, significant application execution performance variability arising from increasingly complex interactions between hardware and system software makes this situation much more difficult for application developers and performance analysts alike. This work considers the performance assessment of the HemeLB exascale flagship application code from the EU HPC Centre of Excellence (CoE) for Computational Biomedicine (CompBioMed) running on the SuperMUC-NG Tier-0 leadership system, using the methodology of the Performance Optimisation and Productivity (POP) CoE. Although 80% scaling efficiency is maintained to over 100,000 MPI processes, disappointing initial performance with more processes and corresponding poor strong scaling was identified to originate from the same few compute nodes in multiple runs, which later system diagnostic checks found had faulty DIMMs and lacklustre performance. Excluding these compute nodes from subsequent runs improved performance of executions with over 300,000 MPI processes by a factor of five, resulting in 190x speed-up compared to 864 MPI processes. While communication efficiency remains very good up to the largest scale, parallel efficiency is primarily limited by load balance found to be largely due to core-to-core and run-to-run variability from excessive stalls for memory accesses, that affect many HPC systems with Intel Xeon Scalable processors. The POP methodology for this performance diagnosis is demonstrated via a detailed exposition with widely deployed 'standard' measurement and analysis tools.
000885773 536__ $$0G:(DE-HGF)POF3-511$$a511 - Computational Science and Mathematical Methods (POF3-511)$$cPOF3-511$$fPOF III$$x0
000885773 536__ $$0G:(EU-Grant)824080$$aPOP2 - Performance Optimisation and Productivity 2 (824080)$$c824080$$fH2020-INFRAEDI-2018-1$$x1
000885773 536__ $$0G:(EU-Grant)675451$$aCompBioMed - A Centre of Excellence in Computational Biomedicine (675451)$$c675451$$fH2020-EINFRA-2015-1$$x2
000885773 536__ $$0G:(DE-Juel-1)ATMLPP$$aATMLPP - ATML Parallel Performance (ATMLPP)$$cATMLPP$$x3
000885773 588__ $$aDataset connected to GVK
000885773 650_7 $$0(DE-588)4728387-7$$2gnd$$aE-Government
000885773 773__ $$a10.1109/HUSTProtools51951.2020.00014
000885773 8564_ $$uhttps://juser.fz-juelich.de/record/885773/files/ProTools20_A4.pdf$$yOpenAccess
000885773 8564_ $$uhttps://juser.fz-juelich.de/record/885773/files/Publisher%27s%20version.pdf
000885773 909CO $$ooai:juser.fz-juelich.de:885773$$pdnbdelivery$$pec_fundedresources$$pVDB$$pdriver$$popen_access$$popenaire
000885773 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132302$$aForschungszentrum Jülich$$b0$$kFZJ
000885773 9131_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0
000885773 9132_ $$0G:(DE-HGF)POF4-899$$1G:(DE-HGF)POF4-890$$2G:(DE-HGF)POF4-800$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bProgrammungebundene Forschung$$lohne Programm$$vohne Topic$$x0
000885773 9141_ $$y2020
000885773 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000885773 920__ $$lyes
000885773 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000885773 980__ $$acontrib
000885773 980__ $$aVDB
000885773 980__ $$aUNRESTRICTED
000885773 980__ $$acontb
000885773 980__ $$aI:(DE-Juel1)JSC-20090406
000885773 9801_ $$aFullTexts