000021706 001__ 21706
000021706 005__ 20250314084101.0
000021706 0247_ $$2ISSN$$a1868-8489
000021706 0247_ $$2urn$$aurn:nbn:de:0001-2012062204
000021706 0247_ $$2Handle$$a2128/4603
000021706 020__ $$a978-3-89336-798-6
000021706 037__ $$aPreJuSER-21706
000021706 041__ $$aEnglish
000021706 082__ $$a500
000021706 082__ $$a600
000021706 1001_ $$0P:(DE-Juel1)VDB107247$$aSzebenyi, Zoltán Péter$$b0$$eCorresponding author$$gmale$$uFZJ
000021706 245__ $$aCapturing Parallel Performance Dynamics
000021706 260__ $$aJülich$$bForschungszentrum Jülich GmbH Zentralbibliothek, Verlag$$c2012
000021706 300__ $$aXXI, 192 S.
000021706 3367_ $$0PUB:(DE-HGF)11$$2PUB:(DE-HGF)$$aDissertation / PhD Thesis
000021706 3367_ $$0PUB:(DE-HGF)3$$2PUB:(DE-HGF)$$aBook
000021706 3367_ $$02$$2EndNote$$aThesis
000021706 3367_ $$2DRIVER$$adoctoralThesis
000021706 3367_ $$2BibTeX$$aPHDTHESIS
000021706 3367_ $$2DataCite$$aOutput Types/Dissertation
000021706 3367_ $$2ORCID$$aDISSERTATION
000021706 4900_ $$0PERI:(DE-600)2525100-4$$aSchriften des Forschungszentrums Jülich. IAS Series$$v12
000021706 502__ $$aRWTH Aachen, Diss., 2012$$bDr. (FH)$$cRWTH Aachen$$d2012
000021706 500__ $$aRecord converted from JUWEL: 18.07.2013
000021706 500__ $$aRecord converted from VDB: 12.11.2012
000021706 520__ $$aSupercomputers play a key role in countless areas of science and engineering, enabling the development of new insights and technological advances never possible before. The strategic importance and ever-growing complexity of the efficient usage of supercomputing resources makes application performance analysis invaluable for the development of parallel codes. Runtime call-path profiling is a conventional, well-known method used for collecting summary statistics of an execution such as the time spent in different call paths of the code. However, these kinds of measurements only give the user a summary overview of the entire execution, without regard to changes in performance behavior over time. The possible causes of temporal changes are quite numerous, ranging from adaptive workload balancing through periodically executed extra work or distinct computational phases to system noise. As present day scientific applications tend to be run for extended periods of time, understanding the patterns and trends in the performance data along the time axis becomes crucial. A straightforward approach is profiling every iteration of the main loop separately. As shown by our analysis of a representative set of scientific codes, such measurements provide a wealth of new data that often leads to invaluable new insights. However, the introduction of the time dimension makes the amount of data collected proportional to the number of iterations, and memory usage and file sizes grow considerably. To counter this problem, a low-overhead online compression algorithm was developed that requires only a fraction of the memory and file sizes needed for an uncompressed measurement. By exploiting similarities between different iterations, the lossy compression algorithm allows all the relevant temporal patterns of the performance behavior to be  reconstructed. While standard, direct instrumentation, which is assumed by the initial version of the compression algorithm, results in fairly low overhead with many scientific codes, in some cases the high frequency of events (e.g., tiny C++ member function calls) makes such measurements impractical. To overcome this problem, a sampling-based methodology could be used instead, where the amount of measurement overhead becomes a function of the sampling frequency, independent of the function-call frequency. However, sampling alone is insufficient for our purposes, as it does not provide access to the communication metrics the compression algorithm heavily depends on. Therefore, a hybrid solution was developed that seamlessly integrates both types of measurement techniques in a single unified measurement, using direct instrumentation for message passing constructs, while sampling the rest of the code. Finally, the compression algorithm was adapted to the hybrid profiling approach, avoiding the overhead of pure direct instrumentation. Evaluation of the above methodologies shows that our semantics-based compression algorithm provides a very good approximation of the original data with very little measurement dilation, while the hybrid combination of sampling and direct instrumentation fulfills its purpose by showing the expected reduction of measurement dilation in cases unsuitable for direct instrumentation. Beyond testing with standardized benchmark suites, the usefulness of these techniques was demonstrated by their key role in gaining important new insights into the performance characteristics of real-world applications.
000021706 536__ $$0G:(DE-Juel1)FUEK411$$2G:(DE-HGF)$$aScientific Computing (FUEK411)$$cFUEK411$$x0
000021706 536__ $$0G:(DE-HGF)POF2-411$$a411 - Computational Science and Mathematical Methods (POF2-411)$$cPOF2-411$$fPOF II$$x1
000021706 536__ $$0G:(DE-Juel-1)ATMLPP$$aATMLPP - ATML Parallel Performance (ATMLPP)$$cATMLPP$$x2
000021706 655_7 $$aHochschulschrift$$xDissertation (FH)
000021706 8564_ $$uhttps://juser.fz-juelich.de/record/21706/files/IAS_Series_12.pdf$$yOpenAccess
000021706 8564_ $$uhttps://juser.fz-juelich.de/record/21706/files/IAS_Series_12.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
000021706 8564_ $$uhttps://juser.fz-juelich.de/record/21706/files/IAS_Series_12.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
000021706 8564_ $$uhttps://juser.fz-juelich.de/record/21706/files/IAS_Series_12.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
000021706 909CO $$ooai:juser.fz-juelich.de:21706$$pdnbdelivery$$pVDB$$pdriver$$purn$$popen_access$$popenaire
000021706 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000021706 9141_ $$y2012
000021706 9132_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0
000021706 9131_ $$0G:(DE-HGF)POF2-411$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$3G:(DE-HGF)POF2$$4G:(DE-HGF)POF$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vComputational Science and Mathematical Methods$$x1
000021706 920__ $$lyes
000021706 9201_ $$0I:(DE-Juel1)JSC-20090406$$gJSC$$kJSC$$lJülich Supercomputing Centre$$x0
000021706 970__ $$aVDB:(DE-Juel1)137764
000021706 980__ $$aVDB
000021706 980__ $$aConvertedRecord
000021706 980__ $$aphd
000021706 980__ $$aI:(DE-Juel1)JSC-20090406
000021706 980__ $$aUNRESTRICTED
000021706 980__ $$aJUWEL
000021706 980__ $$aFullTexts
000021706 9801_ $$aFullTexts