000844062 001__ 844062
000844062 005__ 20250314084117.0
000844062 0247_ $$2Handle$$a2128/17545
000844062 0247_ $$2URN$$aurn:nbn:de:0001-2018012504
000844062 0247_ $$2ISSN$$a1868-8489
000844062 020__ $$a978-3-95806-297-9
000844062 037__ $$aFZJ-2018-01571
000844062 041__ $$aEnglish
000844062 1001_ $$0P:(DE-Juel1)168253$$aHermanns, Marc-André$$b0$$eCorresponding author$$gmale$$ufzj
000844062 245__ $$aUnderstanding the formation of wait states in one-sided communication$$f- 2017-12-04
000844062 260__ $$aJülich$$bForschungszentrum Jülich GmbH Zentralbibliothek, Verlag$$c2018
000844062 300__ $$axiv, 144 S.
000844062 3367_ $$2DataCite$$aOutput Types/Dissertation
000844062 3367_ $$0PUB:(DE-HGF)3$$2PUB:(DE-HGF)$$aBook$$mbook
000844062 3367_ $$2ORCID$$aDISSERTATION
000844062 3367_ $$2BibTeX$$aPHDTHESIS
000844062 3367_ $$02$$2EndNote$$aThesis
000844062 3367_ $$0PUB:(DE-HGF)11$$2PUB:(DE-HGF)$$aDissertation / PhD Thesis$$bphd$$mphd$$s1520864185_25960
000844062 3367_ $$2DRIVER$$adoctoralThesis
000844062 4900_ $$aSchriften des Forschungszentrums Jülich. Reihe IAS$$v35
000844062 502__ $$aRWTH Aachen, Diss., 2017$$bDr.$$cRWTH Aachen$$d2017
000844062 520__ $$aDue to the available concurrency in modern-day supercomputers, the complexity of developing efficient parallel applications for these platforms has grown rapidly in the last years. Many applications use message passing for parallelization, offering three main communication paradigms: point-to-point, collective and one-sided communication. Each paradigm fits certain domains of algorithms and communication patterns best. The one-sided paradigm decouples communication and synchronization and allows a single process to define a complete communication. These are important features for runtime systems of new programming paradigms and state-of-the-art dynamic load-balancing strategies. In any process interaction, wait states can occur, where a process is waiting for another - idling - before it proceeds with its local computation. To eliminate such wait states, runtime and application developers alike need support in detecting and quantifying them and their root causes. However, tool support for identifying complex wait states in one-sided communication is scarce. This thesis contributes novel methods for the scalable detection and quantification of wait states in one-sided communication, the automatic identification of their root causes, and the assessment of optimization potential. The methods for wait-state detection and quantification, as introduced by Böhme et al. and extended by this thesis, build upon a parallel post-mortem traversal of process-local event traces, modeling an application's runtime behavior. Performance-relevant data is exchanged just in time on the recorded communication paths. Through the nature of one-sided communication, information on such communication paths is not available on all processes involved, impeding the use of this original approach for one-sided communication. The use of a novel high-level messaging framework enables the exchange of messages on the implicit communication paths of one-sided communication, while retaining the scalability of the original approach. This enables the identification of previously unstudied types of wait states unique to one-sided communication: lack of remote progress and resource contention. Beyond simple accounting of waiting time, other contributed methods allow pinpointing root causes of such wait states and identifying optimization potential in one-sided applications. Furthermore, they distinguish two fundamentally different classes of wait-state root causes: delays for direct process synchronization (similar to point-to-point and collective communication) and contention in case of lock-based process synchronization, whose resolution strategies are diametrically opposed to each other. Finally, the contributed methods enable the identification of the longest wait-state-free execution path (i.e., critical path) in parallel applications using one-sided communication. As only optimization of functions on the critical path will yield performance improvements, its identification is key to choosing promising optimization targets. All of these methods are integrated into the Scalasca performance toolset. Their scalability and effectiveness are demonstrated by evaluating a variety of applications using one-sided communication interfaces running in configurations with up to 65,536 processes.
000844062 536__ $$0G:(DE-HGF)POF3-511$$a511 - Computational Science and Mathematical Methods (POF3-511)$$cPOF3-511$$fPOF III$$x0
000844062 536__ $$0G:(DE-Juel-1)ATMLPP$$aATMLPP - ATML Parallel Performance (ATMLPP)$$cATMLPP$$x1
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_35_thesis.pdf$$yOpenAccess
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_35_thesis.gif?subformat=icon$$xicon$$yOpenAccess
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_35_thesis.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_35_thesis.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_35_thesis.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_35_thesis.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_Series_35.pdf$$yOpenAccess
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_Series_35.gif?subformat=icon$$xicon$$yOpenAccess
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_Series_35.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_Series_35.jpg?subformat=icon-700$$xicon-700$$yOpenAccess
000844062 8564_ $$uhttps://juser.fz-juelich.de/record/844062/files/IAS_Series_35.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000844062 909CO $$ooai:juser.fz-juelich.de:844062$$pdnbdelivery$$pVDB$$pdriver$$purn$$popen_access$$popenaire
000844062 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000844062 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
000844062 9141_ $$y2018
000844062 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)168253$$aForschungszentrum Jülich$$b0$$kFZJ
000844062 9131_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0
000844062 920__ $$lyes
000844062 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000844062 980__ $$aphd
000844062 980__ $$aVDB
000844062 980__ $$aUNRESTRICTED
000844062 980__ $$abook
000844062 980__ $$aI:(DE-Juel1)JSC-20090406
000844062 9801_ $$aFullTexts