000281229 001__ 281229
000281229 005__ 20210129221650.0
000281229 0247_ $$2doi$$a10.1002/cpe.3552
000281229 0247_ $$2ISSN$$a1040-3108
000281229 0247_ $$2ISSN$$a1096-9128
000281229 0247_ $$2ISSN$$a1532-0626
000281229 0247_ $$2ISSN$$a1532-0634
000281229 0247_ $$2WOS$$aWOS:000376263300002
000281229 037__ $$aFZJ-2016-00928
000281229 082__ $$a004
000281229 1001_ $$0P:(DE-Juel1)144660$$aAlvarez Mallon, Damian$$b0$$eCorresponding author$$ufzj
000281229 245__ $$aMPI and UPC broadcast, scatter and gather algorithms in Xeon Phi
000281229 260__ $$aChichester$$bWiley$$c2016
000281229 3367_ $$2DRIVER$$aarticle
000281229 3367_ $$2DataCite$$aOutput Types/Journal article
000281229 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1462946632_5389
000281229 3367_ $$2BibTeX$$aARTICLE
000281229 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000281229 3367_ $$00$$2EndNote$$aJournal Article
000281229 520__ $$aAccelerators have revolutionised the high performance computing (HPC) community. Despite their advantages, their very specific programming models and limited communication capabilities have kept them in a supporting role of the main processors. With the introduction of Xeon Phi, this is no longer true, as it can be programmed as the main processor and has direct access to the InfiniBand network adapter. Collective operations play a key role in many HPC applications. Therefore, studying its behaviour in the context of manycore coprocessors has great importance. This work analyses the performance of different algorithms for broadcast, scatter and gather, in a large-scale Xeon Phi supercomputer. The algorithms evaluated are those available in the reference message passing interface (MPI) implementation for Xeon Phi (Intel MPI), the default algorithm in an optimised MPI implementation (MVAPICH2-MIC), and a new set of algorithms, developed by the authors of this work, designed with modern processors and new communication features in mind. The latter are implemented in Unified Parallel C (UPC), a partitioned global address space language, leveraging one-sided communications, hierarchical trees and message pipelining. This study scales the experiments to 15360 cores in the Stampede supercomputer and compares the results to Xeon and hybrid Xeon + Xeon Phi experiments, with up to 19456 cores.
000281229 536__ $$0G:(DE-HGF)POF3-513$$a513 - Supercomputer Facility (POF3-513)$$cPOF3-513$$fPOF III$$x0
000281229 588__ $$aDataset connected to CrossRef
000281229 7001_ $$0P:(DE-HGF)0$$aTaboada, Guillermo L.$$b1
000281229 7001_ $$0P:(DE-HGF)0$$aKoesterke, Lars$$b2
000281229 770__ $$aSpecial Issue on Heterogeneous and Unconventional Cluster Architectures and Applications
000281229 773__ $$0PERI:(DE-600)2052606-4$$a10.1002/cpe.3552$$gp. n/a - n/a$$n8$$p2322–2340 $$tConcurrency and computation$$v28$$x1532-0626$$y2016
000281229 909CO $$ooai:juser.fz-juelich.de:281229$$pVDB
000281229 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)144660$$aForschungszentrum Jülich GmbH$$b0$$kFZJ
000281229 9101_ $$0I:(DE-HGF)0$$6P:(DE-HGF)0$$aExternal Institute$$b1$$kExtern
000281229 9101_ $$0I:(DE-HGF)0$$6P:(DE-HGF)0$$aExternal Institute$$b2$$kExtern
000281229 9131_ $$0G:(DE-HGF)POF3-513$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vSupercomputer Facility$$x0
000281229 9141_ $$y2016
000281229 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS
000281229 915__ $$0StatID:(DE-HGF)1160$$2StatID$$aDBCoverage$$bCurrent Contents - Engineering, Computing and Technology
000281229 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bCONCURR COMP-PRACT E : 2014
000281229 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection
000281229 915__ $$0StatID:(DE-HGF)0111$$2StatID$$aWoS$$bScience Citation Index Expanded
000281229 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF < 5
000281229 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline
000281229 915__ $$0StatID:(DE-HGF)0550$$2StatID$$aNo Authors Fulltext
000281229 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bThomson Reuters Master Journal List
000281229 920__ $$lyes
000281229 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000281229 980__ $$ajournal
000281229 980__ $$aVDB
000281229 980__ $$aUNRESTRICTED
000281229 980__ $$aI:(DE-Juel1)JSC-20090406