Characterizing Load and Communication Imbalance in Parallel Applications

Böhme, David
000151145 001__ 151145
000151145 005__ 20250314084109.0
000151145 0247_ $$2URN$$aurn:nbn:de:0001-2014012708
000151145 0247_ $$2Handle$$a2128/5909
000151145 0247_ $$2ISSN$$a1868-8489
000151145 020__ $$a978-3-89336-940-9
000151145 037__ $$aFZJ-2014-01145
000151145 041__ $$aEnglish
000151145 1001_ $$0P:(DE-HGF)0$$aBöhme, David$$b0$$eCorresponding author$$gmale$$ufzj
000151145 245__ $$aCharacterizing Load and Communication Imbalance in Parallel Applications$$f2013-06-07
000151145 260__ $$aJülich$$bForschungszentrum Jülich GmbH Zentralbibliothek, Verlag$$c2014
000151145 300__ $$axv, 111 S.
000151145 3367_ $$0PUB:(DE-HGF)11$$2PUB:(DE-HGF)$$aDissertation / PhD Thesis$$bphd$$mphd$$s151145
000151145 3367_ $$02$$2EndNote$$aThesis
000151145 3367_ $$2DRIVER$$adoctoralThesis
000151145 3367_ $$2BibTeX$$aPHDTHESIS
000151145 3367_ $$2DataCite$$aOutput Types/Dissertation
000151145 3367_ $$2ORCID$$aDISSERTATION
000151145 4900_ $$aSchriften des Forschungszentrums Jülich. IAS Series$$v23
000151145 502__ $$aRWTH Aachen, Diss., 2013$$bDr.$$cRWTH Aachen$$d2013
000151145 500__ $$3POF3_Assignment on 2016-02-29
000151145 520__ $$aThe amount of parallelism in modern supercomputers currently grows from generation to generation. Further application performance improvements therefore depend on software-managed parallelism: the software must organize data exchange between processing elements efficiently and optimally distribute the workload between them. Performance analysis tools help developers of parallel applications to evaluate and optimize the parallel efficiency of their programs. This dissertation presents two novel methods to automatically detect imbalance-related performance problems in MPI programs and intuitively guide the performance analyst to  inefficiencies whose optimization promise the highest benefit. The first method, the delay analysis, identifies the root causes of wait states. A delay occurs when a program activity needs more time on one process than on another, which leads to the formation of wait states at a subsequent synchronization point. Wait states are the primary symptom of load imbalance in parallel programs. While wait states themselves are easy to detect, the potentially large temporal and spatial distance between wait states and the delays causing them complicates the identification of wait-state root causes. The delay analysis closes this gap, accounting for both short-term and long-term effects. The second method is based on the detection of the critical path, which determines the effect of imbalance on program runtime. The critical path is the longest execution path in a parallel program without wait states: optimizing an activity on the critical path will reduce the program’s runtime. Comparing the duration of activities on the critical path with their duration on each process yields a set of novel, compact performance indicators. These indicators allow users to evaluate load balance, identify performance bottlenecks, and determine the performance impact of load imbalance at first glance by providing an intuitive understanding of complex performance phenomena.Both analysis methods leverage the scalable event-trace analysis technique employed by the Scalasca toolset: by replaying event traces in parallel, the bottleneck search algorithms can harness the distributed memory and computational resources of the target system for the analysis, allowing them to process even large-scale program runs. The scalability and performance insight that the novel analysis approaches provide are demonstrated by evaluating a variety of real-world HPC codes in configurations with up to 262,144 processor cores.
000151145 536__ $$0G:(DE-HGF)POF2-411$$a411 - Computational Science and Mathematical Methods (POF2-411)$$cPOF2-411$$fPOF II$$x0
000151145 536__ $$0G:(DE-Juel-1)ATMLPP$$aATMLPP - ATML Parallel Performance (ATMLPP)$$cATMLPP$$x1
000151145 650_7 $$0V:(DE-588b)4012494-0$$2GND$$aDissertation$$xDiss.
000151145 8564_ $$uhttps://juser.fz-juelich.de/record/151145/files/IAS_Series_23_PDF-A.pdf$$yOpenAccess
000151145 8564_ $$uhttps://juser.fz-juelich.de/record/151145/files/IAS_Series_23_PDF-A.ps.gz$$yOpenAccess
000151145 8564_ $$uhttps://juser.fz-juelich.de/record/151145/files/IAS_Series_23_PDF-A.gif?subformat=icon$$xicon$$yOpenAccess
000151145 8564_ $$uhttps://juser.fz-juelich.de/record/151145/files/IAS_Series_23_PDF-A.gif?subformat=icon-700$$xicon-700$$yOpenAccess
000151145 8564_ $$uhttps://juser.fz-juelich.de/record/151145/files/IAS_Series_23_PDF-A.jpg?subformat=icon-144$$xicon-144$$yOpenAccess
000151145 8564_ $$uhttps://juser.fz-juelich.de/record/151145/files/IAS_Series_23_PDF-A.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
000151145 8564_ $$uhttps://juser.fz-juelich.de/record/151145/files/IAS_Series_23_PDF-A.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
000151145 909CO $$ooai:juser.fz-juelich.de:151145$$pdnbdelivery$$pVDB$$pdriver$$purn$$popen_access$$popenaire
000151145 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000151145 9141_ $$y2014
000151145 9132_ $$0G:(DE-HGF)POF3-519H$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vAddenda$$x0
000151145 9131_ $$0G:(DE-HGF)POF2-411$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$3G:(DE-HGF)POF2$$4G:(DE-HGF)POF$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vComputational Science and Mathematical Methods$$x0
000151145 920__ $$lyes
000151145 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000151145 980__ $$aphd
000151145 980__ $$aUNRESTRICTED
000151145 980__ $$aJUWEL
000151145 980__ $$aFullTexts
000151145 980__ $$aI:(DE-Juel1)JSC-20090406
000151145 980__ $$aVDB
000151145 9801_ $$aFullTexts
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe