000811713 001__ 811713
000811713 005__ 20250314084114.0
000811713 0247_ $$2doi$$a10.1145/2934661
000811713 037__ $$aFZJ-2016-04097
000811713 041__ $$aEnglish
000811713 1001_ $$0P:(DE-HGF)0$$aBöhme, David$$b0$$eCorresponding author
000811713 245__ $$aIdentifying the Root Causes of Wait States in Large-Scale Parallel Applications
000811713 260__ $$aNew York, NY$$bacm Association for Computing Machinery$$c2016
000811713 3367_ $$2DRIVER$$aarticle
000811713 3367_ $$2DataCite$$aOutput Types/Journal article
000811713 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1470211850_15478
000811713 3367_ $$2BibTeX$$aARTICLE
000811713 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000811713 3367_ $$00$$2EndNote$$aJournal Article
000811713 520__ $$aDriven by growing application requirements and accelerated by current trends in microprocessor design, the number of processor cores on modern supercomputers is increasing from generation to generation. However, load or communication imbalance prevents many codes from taking advantage of the available parallelism, as delays of single processes may spread wait states across the entire machine. Moreover, when employing complex point-to-point communication patterns, wait states may propagate along far-reaching cause-effect chains that are hard to track manually and that complicate an assessment of the actual costs of an imbalance. Building on earlier work by Meira Jr. et al., we present a scalable approach that identifies program wait states and attributes their costs in terms of resource waste to their original cause. By replaying event traces in parallel both forward and backward, we can identify the processes and call paths responsible for the most severe imbalances even for runs with hundreds of thousands of processes.
000811713 536__ $$0G:(DE-HGF)POF3-511$$a511 - Computational Science and Mathematical Methods (POF3-511)$$cPOF3-511$$fPOF III$$x0
000811713 536__ $$0G:(DE-Juel-1)ATMLPP$$aATMLPP - ATML Parallel Performance (ATMLPP)$$cATMLPP$$x1
000811713 588__ $$aDataset connected to CrossRef
000811713 7001_ $$0P:(DE-Juel1)132112$$aGeimer, Markus$$b1$$ufzj
000811713 7001_ $$0P:(DE-Juel1)132044$$aArnold, Lukas$$b2$$ufzj
000811713 7001_ $$0P:(DE-HGF)0$$aVoigtlaender, Felix$$b3
000811713 7001_ $$0P:(DE-HGF)0$$aWolf, Felix$$b4
000811713 773__ $$0PERI:(DE-600)2845845-X$$a10.1145/2934661$$gVol. 3, no. 2, p. 1 - 24$$n2$$p11$$tACM Transactions on Parallel Computing$$v3$$x2374-0353$$y2016
000811713 8564_ $$uhttps://juser.fz-juelich.de/record/811713/files/TOPC-201607-03-02-11.pdf$$yRestricted
000811713 8564_ $$uhttps://juser.fz-juelich.de/record/811713/files/TOPC-201607-03-02-11.gif?subformat=icon$$xicon$$yRestricted
000811713 8564_ $$uhttps://juser.fz-juelich.de/record/811713/files/TOPC-201607-03-02-11.jpg?subformat=icon-1440$$xicon-1440$$yRestricted
000811713 8564_ $$uhttps://juser.fz-juelich.de/record/811713/files/TOPC-201607-03-02-11.jpg?subformat=icon-180$$xicon-180$$yRestricted
000811713 8564_ $$uhttps://juser.fz-juelich.de/record/811713/files/TOPC-201607-03-02-11.jpg?subformat=icon-640$$xicon-640$$yRestricted
000811713 909CO $$ooai:juser.fz-juelich.de:811713$$pVDB
000811713 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132112$$aForschungszentrum Jülich$$b1$$kFZJ
000811713 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132044$$aForschungszentrum Jülich$$b2$$kFZJ
000811713 9131_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0
000811713 9141_ $$y2016
000811713 920__ $$lyes
000811713 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000811713 980__ $$ajournal
000811713 980__ $$aVDB
000811713 980__ $$aI:(DE-Juel1)JSC-20090406
000811713 980__ $$aUNRESTRICTED