Scalable PGAS collective operations in NUMA clusters

Alvarez Mallon, Damian; Wibecan, Brian; González-Domínguez, Jorge; Teijeiro, Carlos; Gómez, Andrés; Taboada, Guillermo L.
doi:10.1007/s10586-014-0377-9
000172874 001__ 172874
000172874 005__ 20210129214525.0
000172874 0247_ $$2doi$$a10.1007/s10586-014-0377-9
000172874 0247_ $$2ISSN$$a1386-7857
000172874 0247_ $$2ISSN$$a1573-7543
000172874 0247_ $$2WOS$$aWOS:000345077400027
000172874 037__ $$aFZJ-2014-06308
000172874 082__ $$a004
000172874 1001_ $$0P:(DE-Juel1)144660$$aAlvarez Mallon, Damian$$b0$$eCorresponding Author
000172874 245__ $$aScalable PGAS collective operations in NUMA clusters
000172874 260__ $$aDordrecht [u.a.]$$bSpringer Science + Business Media B.V$$c2014
000172874 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1417444599_27075
000172874 3367_ $$2DataCite$$aOutput Types/Journal article
000172874 3367_ $$00$$2EndNote$$aJournal Article
000172874 3367_ $$2BibTeX$$aARTICLE
000172874 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000172874 3367_ $$2DRIVER$$aarticle
000172874 520__ $$aThe increasing number of cores per processor is turning manycore-based systems in pervasive. This involves dealing with multiple levels of memory in non uniform memory access (NUMA) systems and processor cores hierarchies, accessible via complex interconnects in order to dispatch the increasing amount of data required by the processing elements. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one sided communications becomes more important in these systems, to avoid unnecessary synchronization between pairs of processes in collective operations implemented in terms of two sided point to point functions. This work proposes a series of algorithms that provide a good performance and scalability in collective operations, based on the use of hierarchical trees, overlapping one-sided communications, message pipelining and the available NUMA binding features. An implementation has been developed for Unified Parallel C, a Partitioned Global Address Space language, which presents a shared memory view across the nodes for programmability, while keeping private memory regions for performance. The performance evaluation of the proposed implementation, conducted on five representative systems (JuRoPA, JUDGE, Finis Terrae, SVG and Superdome), has shown generally good performance and scalability, even outperforming MPI in some cases, which confirms the suitability of the developed algorithms for manycore architectures.
000172874 536__ $$0G:(DE-HGF)POF2-41G21$$a41G - Supercomputer Facility (POF2-41G21)$$cPOF2-41G21$$fPOF II$$x0
000172874 588__ $$aDataset connected to CrossRef, juser.fz-juelich.de
000172874 7001_ $$0P:(DE-HGF)0$$aTaboada, Guillermo L.$$b1
000172874 7001_ $$0P:(DE-HGF)0$$aTeijeiro, Carlos$$b2
000172874 7001_ $$0P:(DE-HGF)0$$aGonzález-Domínguez, Jorge$$b3
000172874 7001_ $$0P:(DE-HGF)0$$aGómez, Andrés$$b4
000172874 7001_ $$0P:(DE-HGF)0$$aWibecan, Brian$$b5
000172874 773__ $$0PERI:(DE-600)2012757-1$$a10.1007/s10586-014-0377-9$$gVol. 17, no. 4, p. 1473 - 1495$$n4$$p1473 - 1495$$tCluster computing$$v17$$x1573-7543$$y2014
000172874 8564_ $$uhttps://juser.fz-juelich.de/record/172874/files/FZJ-2014-06308.pdf$$yRestricted
000172874 909CO $$ooai:juser.fz-juelich.de:172874$$pVDB
000172874 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)144660$$aForschungszentrum Jülich GmbH$$b5$$kFZJ
000172874 9132_ $$0G:(DE-HGF)POF3-513$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$aDE-HGF$$bPOF III$$lKey Technologies$$vSupercomputing & Big Data $$x0
000172874 9131_ $$0G:(DE-HGF)POF2-41G21$$1G:(DE-HGF)POF2-410$$2G:(DE-HGF)POF2-400$$3G:(DE-HGF)POF2$$4G:(DE-HGF)POF$$aDE-HGF$$bSchlüsseltechnologien$$lSupercomputing$$vSupercomputer Facility$$x0
000172874 9141_ $$y2014
000172874 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR
000172874 915__ $$0StatID:(DE-HGF)0111$$2StatID$$aWoS$$bScience Citation Index Expanded
000172874 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection
000172874 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bThomson Reuters Master Journal List
000172874 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS
000172874 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline
000172874 915__ $$0StatID:(DE-HGF)0420$$2StatID$$aNationallizenz
000172874 915__ $$0StatID:(DE-HGF)1160$$2StatID$$aDBCoverage$$bCurrent Contents - Engineering, Computing and Technology
000172874 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF <  5
000172874 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000172874 980__ $$ajournal
000172874 980__ $$aVDB
000172874 980__ $$aI:(DE-Juel1)JSC-20090406
000172874 980__ $$aUNRESTRICTED
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help