001     172874
005     20210129214525.0
024 7 _ |a 10.1007/s10586-014-0377-9
|2 doi
024 7 _ |a 1386-7857
|2 ISSN
024 7 _ |a 1573-7543
|2 ISSN
024 7 _ |a WOS:000345077400027
|2 WOS
037 _ _ |a FZJ-2014-06308
082 _ _ |a 004
100 1 _ |a Alvarez Mallon, Damian
|0 P:(DE-Juel1)144660
|b 0
|e Corresponding Author
245 _ _ |a Scalable PGAS collective operations in NUMA clusters
260 _ _ |a Dordrecht [u.a.]
|c 2014
|b Springer Science + Business Media B.V
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1417444599_27075
|2 PUB:(DE-HGF)
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|0 0
|2 EndNote
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a article
|2 DRIVER
520 _ _ |a The increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures.
536 _ _ |a 41G - Supercomputer Facility (POF2-41G21)
|0 G:(DE-HGF)POF2-41G21
|c POF2-41G21
|f POF II
|x 0
588 _ _ |a Dataset connected to CrossRef, juser.fz-juelich.de
700 1 _ |a Taboada, Guillermo L.
|0 P:(DE-HGF)0
|b 1
700 1 _ |a Teijeiro, Carlos
|0 P:(DE-HGF)0
|b 2
700 1 _ |a González-Domínguez, Jorge
|0 P:(DE-HGF)0
|b 3
700 1 _ |a Gómez, Andrés
|0 P:(DE-HGF)0
|b 4
700 1 _ |a Wibecan, Brian
|0 P:(DE-HGF)0
|b 5
773 _ _ |a 10.1007/s10586-014-0377-9
|g Vol. 17, no. 4, p. 1473 - 1495
|0 PERI:(DE-600)2012757-1
|n 4
|p 1473 - 1495
|t Cluster computing
|v 17
|y 2014
|x 1573-7543
856 4 _ |u https://juser.fz-juelich.de/record/172874/files/FZJ-2014-06308.pdf
|y Restricted
909 C O |o oai:juser.fz-juelich.de:172874
|p VDB
910 1 _ |a Forschungszentrum Jülich GmbH
|0 I:(DE-588b)5008462-8
|k FZJ
|b 5
|6 P:(DE-Juel1)144660
913 2 _ |a DE-HGF
|b POF III
|l Key Technologies
|1 G:(DE-HGF)POF3-510
|0 G:(DE-HGF)POF3-513
|2 G:(DE-HGF)POF3-500
|v Supercomputing & Big Data
|x 0
913 1 _ |a DE-HGF
|b Schlüsseltechnologien
|l Supercomputing
|1 G:(DE-HGF)POF2-410
|0 G:(DE-HGF)POF2-41G21
|2 G:(DE-HGF)POF2-400
|v Supercomputer Facility
|x 0
|4 G:(DE-HGF)POF
|3 G:(DE-HGF)POF2
914 1 _ |y 2014
915 _ _ |a JCR
|0 StatID:(DE-HGF)0100
|2 StatID
915 _ _ |a WoS
|0 StatID:(DE-HGF)0111
|2 StatID
|b Science Citation Index Expanded
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0150
|2 StatID
|b Web of Science Core Collection
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0199
|2 StatID
|b Thomson Reuters Master Journal List
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0300
|2 StatID
|b Medline
915 _ _ |a Nationallizenz
|0 StatID:(DE-HGF)0420
|2 StatID
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1160
|2 StatID
|b Current Contents - Engineering, Computing and Technology
915 _ _ |a IF < 5
|0 StatID:(DE-HGF)9900
|2 StatID
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 _ _ |a UNRESTRICTED


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21