000010841 001__ 10841
000010841 005__ 20250314084058.0
000010841 0247_ $$2Handle$$a2128/3787
000010841 0247_ $$2URI$$a3787
000010841 0247_ $$2ISSN$$a1868-8489
000010841 020__ $$a978-3-89336-625-5
000010841 037__ $$aPreJuSER-10841
000010841 1001_ $$0P:(DE-Juel1)VDB62975$$aBecker, Daniel$$b0$$eCorresponding author$$gmale$$uFZJ
000010841 245__ $$aTimestamp Synchronization of Concurrent Events
000010841 260__ $$aJülich$$bForschungszentrum Jülich GmbH Zentralbibliothek, Verlag$$c2010
000010841 300__ $$aXVIII, 116 S.
000010841 3367_ $$0PUB:(DE-HGF)11$$2PUB:(DE-HGF)$$aDissertation / PhD Thesis
000010841 3367_ $$0PUB:(DE-HGF)3$$2PUB:(DE-HGF)$$aBook
000010841 3367_ $$02$$2EndNote$$aThesis
000010841 3367_ $$2DRIVER$$adoctoralThesis
000010841 3367_ $$2BibTeX$$aPHDTHESIS
000010841 3367_ $$2DataCite$$aOutput Types/Dissertation
000010841 3367_ $$2ORCID$$aDISSERTATION
000010841 4900_ $$0PERI:(DE-600)2525100-4$$aSchriften des Forschungszentrums Jülich : IAS Series$$v4
000010841 502__ $$aRWTH Aachen, Diss., 2010$$bDr. (FH)$$cRWTH Aachen$$d2010
000010841 500__ $$aRecord converted from VDB: 12.11.2012
000010841 520__ $$aSupercomputing is a key technological pillar of modern science and engineering, indispensable for solving critical problems of high complexity. However, to effectively utilize the enormously complex large-scale computer systems available today, scientists and engineers need powerful and robust software development tools. One technique widely used by such tools is event tracing with a broad spectrum of applications ranging from performance analysis, performance prediction and modeling to debugging. In particular, event traces are helpful in understanding the performance behavior of parallel programs since they allow the in-depth analysis of communication and synchronization patterns. The accuracy of such analyses depends on the comparability of timestamps taken on different processors and may be adversely affected by non-synchronized clocks leading to inaccurate relative event timings. Such inaccuracies may cause a given interval to appear shorter or longer than it actually was, or introduce violations of the logical event order, which requires a message to be received only after it has been sent. Inconsistent trace data may not only lead to false conclusions, for instance, when the impact of communication patterns is quantified, but may also confuse the user of trace-visualization tools by causing message arrows to point backward in time-line views. Even more strikingly, trace-analysis tools may also cease to work in a satisfactorymanner if they rely on the correct order to function properly. Although linear offset interpolation can restore the consistency of the trace data to some degree, time-dependent drifts and other inaccuracies may still disarrange the original sequence of events, as shown in a study conducted as a part of this Ph.D. thesis. The already familiar controlled logical clock algorithm accounts for such violations in point-to-point communication by shifting message events in time as much as needed while trying to preserve the length of local intervals. This algorithm is, however, not suitable for realistic applications because (i) it ignores collective and shared-memory operations and (ii) as a serial algorithm it offers only limited scalability. This thesis addresses these shortcomings by extending the algorithm to restore event semantics related to collective and shared-memory operations and by parallelizing the extended version to make it suitable for large-scale systems including computational grids. The basic idea behind the semantic extension is to consider collective and shared-memory operations as being composed of multiple point-to-point messages, taking the semantics of the different flavors of these operations into account. In order to accomplish the correction in a scalable way, both distributed memory and parallel processing capabilities are exploited by processing separate local trace files in parallel and replaying the original communication on as many CPUs as were used to execute the target application itself. To employ the replay mechanism in computational grids, this work also defines the necessary infrastructure to accurately measure clock offsets in distributed environments with hierarchical networks. The methodology was evaluated in practice by integrating the extended and parallelized algorithm into the Scalasca trace-analysis framework and applied to traces of realistic applications taken on single cluster systems and computational grids. The thesis shows that the algorithm eliminates inconsistent timings of concurrent events while onlymarginally changing the length of intervals between local events – even if wide-area communication is involved. Scalability is demonstrated with up to 4,096 application processes.
000010841 536__ $$0G:(DE-Juel1)FUEK411$$2G:(DE-HGF)$$aScientific Computing (FUEK411)$$cFUEK411$$x0
000010841 536__ $$0G:(DE-HGF)POF2-411$$a411 - Computational Science and Mathematical Methods (POF2-411)$$cPOF2-411$$fPOF II$$x1
000010841 536__ $$0G:(DE-Juel-1)ATMLPP$$aATMLPP - ATML Parallel Performance (ATMLPP)$$cATMLPP$$x2
000010841 655_7 $$aHochschulschrift$$xDissertation (FH)
000010841 8564_ $$uhttps://juser.fz-juelich.de/record/10841/files/IAS%20Series_04.pdf$$yOpenAccess
000010841 8564_ $$uhttps://juser.fz-juelich.de/record/10841/files/IAS%20Series_04.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
000010841 8564_ $$uhttps://juser.fz-juelich.de/record/10841/files/IAS%20Series_04.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
000010841 8564_ $$uhttps://juser.fz-juelich.de/record/10841/files/IAS%20Series_04.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
000010841 909CO $$ooai:juser.fz-juelich.de:10841$$pdnbdelivery$$pVDB$$pdriver$$popen_access$$popenaire
000010841 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000010841 9141_ $$y2010
000010841 9132_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0
000010841 9131_ $$0G:(DE-HGF)POF2-411$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$3G:(DE-HGF)POF2$$4G:(DE-HGF)POF$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vComputational Science and Mathematical Methods$$x1
000010841 920__ $$lyes
000010841 9201_ $$0I:(DE-Juel1)JSC-20090406$$gJSC$$kJSC$$lJülich Supercomputing Centre$$x0
000010841 970__ $$aVDB:(DE-Juel1)121432
000010841 980__ $$aVDB
000010841 980__ $$aJUWEL
000010841 980__ $$aConvertedRecord
000010841 980__ $$aphd
000010841 980__ $$aI:(DE-Juel1)JSC-20090406
000010841 980__ $$aUNRESTRICTED
000010841 980__ $$aFullTexts
000010841 9801_ $$aFullTexts