001     128163
005     20250314084107.0
024 7 _ |a 10.1016/j.parco.2012.09.002
|2 doi
024 7 _ |a 0167-8191
|2 ISSN
024 7 _ |a 1872-7336
|2 ISSN
024 7 _ |a WOS:000317371900004
|2 WOS
037 _ _ |a FZJ-2012-01058
082 _ _ |a 004
100 1 _ |a Hermanns, Marc-André
|0 P:(DE-HGF)0
|b 0
|e Corresponding author
245 _ _ |a A scalable infrastructure for the performance analysis of passive target synchronization
260 _ _ |a Amsterdam [u.a.]
|c 2013
|b North-Holland, Elsevier Science
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1366635474_9685
|2 PUB:(DE-HGF)
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|0 0
|2 EndNote
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a article
|2 DRIVER
520 _ _ |a Partitioned global address space (PGAS) languages combine the convenient abstraction of shared memory with the notion of affinity, extending multi-threaded programming to large-scale systems with physically distributed memory. However, in spite of their obvious advantages, PGAS languages still lack appropriate tool support for performance analysis, one of the reasons why their adoption is still in its infancy. Some of the performance problems for which tool support is needed occur at the level of the underlying one-sided communication substrate, such as the Aggregate Remote Memory Copy Interface (ARMCI). One such example is the waiting time in situations where asynchronous data transfers cannot be completed without software intervention at the target side. This is not uncommon on systems with reduced operating-system kernels such as IBM Blue Gene/P where the use of progress threads would double the number of cores necessary to run an application. In this paper, we present an extension of the Scalasca trace-analysis infrastructure aimed at the identification and quantification of progress-related waiting times at larger scales. We demonstrate its utility and scalability using a benchmark running with up to 32,768 processes.
536 _ _ |a 411 - Computational Science and Mathematical Methods (POF2-411)
|0 G:(DE-HGF)POF2-411
|c POF2-411
|x 0
|f POF II
536 _ _ |0 G:(DE-Juel-1)ATMLPP
|a ATMLPP - ATML Parallel Performance (ATMLPP)
|c ATMLPP
|x 1
588 _ _ |a Dataset connected to CrossRef, juser.fz-juelich.de
700 1 _ |a Krishnamoorthy, Sriram
|0 P:(DE-HGF)0
|b 1
700 1 _ |a Wolf, Felix
|0 P:(DE-Juel1)132299
|b 2
773 _ _ |a 10.1016/j.parco.2012.09.002
|0 PERI:(DE-600)1466340-5
|n 3
|p 132-145
|t Parallel computing
|v 39
856 4 _ |u https://juser.fz-juelich.de/record/128163/files/FZJ-2012-01058.pdf
|y Restricted
909 C O |o oai:juser.fz-juelich.de:128163
|p VDB
910 1 _ |a Forschungszentrum Jülich GmbH
|0 I:(DE-588b)5008462-8
|k FZJ
|b 2
|6 P:(DE-Juel1)132299
913 2 _ |a DE-HGF
|b Key Technologies
|l Supercomputing & Big Data
|1 G:(DE-HGF)POF3-510
|0 G:(DE-HGF)POF3-511
|2 G:(DE-HGF)POF3-500
|v Computational Science and Mathematical Methods
|x 0
913 1 _ |a DE-HGF
|b Schlüsseltechnologien
|l Supercomputing
|1 G:(DE-HGF)POF2-410
|0 G:(DE-HGF)POF2-411
|2 G:(DE-HGF)POF2-400
|v Computational Science and Mathematical Methods
|x 0
|4 G:(DE-HGF)POF
|3 G:(DE-HGF)POF2
914 1 _ |y 2013
915 _ _ |a JCR/ISI refereed
|0 StatID:(DE-HGF)0010
|2 StatID
915 _ _ |a JCR
|0 StatID:(DE-HGF)0100
|2 StatID
915 _ _ |a WoS
|0 StatID:(DE-HGF)0111
|2 StatID
|b Science Citation Index Expanded
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0150
|2 StatID
|b Web of Science Core Collection
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0199
|2 StatID
|b Thomson Reuters Master Journal List
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0300
|2 StatID
|b Medline
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1050
|2 StatID
|b BIOSIS Previews
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)JSC-20090406


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21