000021706 001__ 21706 000021706 005__ 20250314084101.0 000021706 0247_ $$2ISSN$$a1868-8489 000021706 0247_ $$2urn$$aurn:nbn:de:0001-2012062204 000021706 0247_ $$2Handle$$a2128/4603 000021706 020__ $$a978-3-89336-798-6 000021706 037__ $$aPreJuSER-21706 000021706 041__ $$aEnglish 000021706 082__ $$a500 000021706 082__ $$a600 000021706 1001_ $$0P:(DE-Juel1)VDB107247$$aSzebenyi, Zoltán Péter$$b0$$eCorresponding author$$gmale$$uFZJ 000021706 245__ $$aCapturing Parallel Performance Dynamics 000021706 260__ $$aJülich$$bForschungszentrum Jülich GmbH Zentralbibliothek, Verlag$$c2012 000021706 300__ $$aXXI, 192 S. 000021706 3367_ $$0PUB:(DE-HGF)11$$2PUB:(DE-HGF)$$aDissertation / PhD Thesis 000021706 3367_ $$0PUB:(DE-HGF)3$$2PUB:(DE-HGF)$$aBook 000021706 3367_ $$02$$2EndNote$$aThesis 000021706 3367_ $$2DRIVER$$adoctoralThesis 000021706 3367_ $$2BibTeX$$aPHDTHESIS 000021706 3367_ $$2DataCite$$aOutput Types/Dissertation 000021706 3367_ $$2ORCID$$aDISSERTATION 000021706 4900_ $$0PERI:(DE-600)2525100-4$$aSchriften des Forschungszentrums Jülich. IAS Series$$v12 000021706 502__ $$aRWTH Aachen, Diss., 2012$$bDr. (FH)$$cRWTH Aachen$$d2012 000021706 500__ $$aRecord converted from JUWEL: 18.07.2013 000021706 500__ $$aRecord converted from VDB: 12.11.2012 000021706 520__ $$aSupercomputers play a key role in countless areas of science and engineering, enabling the development of new insights and technological advances never possible before. The strategic importance and ever-growing complexity of the efficient usage of supercomputing resources makes application performance analysis invaluable for the development of parallel codes. Runtime call-path profiling is a conventional, well-known method used for collecting summary statistics of an execution such as the time spent in different call paths of the code. However, these kinds of measurements only give the user a summary overview of the entire execution, without regard to changes in performance behavior over time. The possible causes of temporal changes are quite numerous, ranging from adaptive workload balancing through periodically executed extra work or distinct computational phases to system noise. As present day scientific applications tend to be run for extended periods of time, understanding the patterns and trends in the performance data along the time axis becomes crucial. A straightforward approach is profiling every iteration of the main loop separately. As shown by our analysis of a representative set of scientific codes, such measurements provide a wealth of new data that often leads to invaluable new insights. However, the introduction of the time dimension makes the amount of data collected proportional to the number of iterations, and memory usage and file sizes grow considerably. To counter this problem, a low-overhead online compression algorithm was developed that requires only a fraction of the memory and file sizes needed for an uncompressed measurement. By exploiting similarities between different iterations, the lossy compression algorithm allows all the relevant temporal patterns of the performance behavior to be reconstructed. While standard, direct instrumentation, which is assumed by the initial version of the compression algorithm, results in fairly low overhead with many scientific codes, in some cases the high frequency of events (e.g., tiny C++ member function calls) makes such measurements impractical. To overcome this problem, a sampling-based methodology could be used instead, where the amount of measurement overhead becomes a function of the sampling frequency, independent of the function-call frequency. However, sampling alone is insufficient for our purposes, as it does not provide access to the communication metrics the compression algorithm heavily depends on. Therefore, a hybrid solution was developed that seamlessly integrates both types of measurement techniques in a single unified measurement, using direct instrumentation for message passing constructs, while sampling the rest of the code. Finally, the compression algorithm was adapted to the hybrid profiling approach, avoiding the overhead of pure direct instrumentation. Evaluation of the above methodologies shows that our semantics-based compression algorithm provides a very good approximation of the original data with very little measurement dilation, while the hybrid combination of sampling and direct instrumentation fulfills its purpose by showing the expected reduction of measurement dilation in cases unsuitable for direct instrumentation. Beyond testing with standardized benchmark suites, the usefulness of these techniques was demonstrated by their key role in gaining important new insights into the performance characteristics of real-world applications. 000021706 536__ $$0G:(DE-Juel1)FUEK411$$2G:(DE-HGF)$$aScientific Computing (FUEK411)$$cFUEK411$$x0 000021706 536__ $$0G:(DE-HGF)POF2-411$$a411 - Computational Science and Mathematical Methods (POF2-411)$$cPOF2-411$$fPOF II$$x1 000021706 536__ $$0G:(DE-Juel-1)ATMLPP$$aATMLPP - ATML Parallel Performance (ATMLPP)$$cATMLPP$$x2 000021706 655_7 $$aHochschulschrift$$xDissertation (FH) 000021706 8564_ $$uhttps://juser.fz-juelich.de/record/21706/files/IAS_Series_12.pdf$$yOpenAccess 000021706 8564_ $$uhttps://juser.fz-juelich.de/record/21706/files/IAS_Series_12.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess 000021706 8564_ $$uhttps://juser.fz-juelich.de/record/21706/files/IAS_Series_12.jpg?subformat=icon-180$$xicon-180$$yOpenAccess 000021706 8564_ $$uhttps://juser.fz-juelich.de/record/21706/files/IAS_Series_12.jpg?subformat=icon-640$$xicon-640$$yOpenAccess 000021706 909CO $$ooai:juser.fz-juelich.de:21706$$pdnbdelivery$$pVDB$$pdriver$$purn$$popen_access$$popenaire 000021706 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess 000021706 9141_ $$y2012 000021706 9132_ $$0G:(DE-HGF)POF3-511$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vComputational Science and Mathematical Methods$$x0 000021706 9131_ $$0G:(DE-HGF)POF2-411$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$3G:(DE-HGF)POF2$$4G:(DE-HGF)POF$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vComputational Science and Mathematical Methods$$x1 000021706 920__ $$lyes 000021706 9201_ $$0I:(DE-Juel1)JSC-20090406$$gJSC$$kJSC$$lJülich Supercomputing Centre$$x0 000021706 970__ $$aVDB:(DE-Juel1)137764 000021706 980__ $$aVDB 000021706 980__ $$aConvertedRecord 000021706 980__ $$aphd 000021706 980__ $$aI:(DE-Juel1)JSC-20090406 000021706 980__ $$aUNRESTRICTED 000021706 980__ $$aJUWEL 000021706 980__ $$aFullTexts 000021706 9801_ $$aFullTexts