Design of scalable PGAS collectives for NUMA and manycore systems

Alvarez Mallon, Damian
000172875 001__ 172875
000172875 005__ 20210129214525.0
000172875 037__ $$aFZJ-2014-06309
000172875 1001_ $$0P:(DE-Juel1)144660$$aAlvarez Mallon, Damian$$b0$$eCorresponding Author$$gmale$$ufzj
000172875 245__ $$aDesign of scalable PGAS collectives for NUMA and manycore systems$$f2014-10-27
000172875 260__ $$c2014
000172875 300__ $$a239 p.
000172875 3367_ $$0PUB:(DE-HGF)11$$2PUB:(DE-HGF)$$aDissertation / PhD Thesis$$bphd$$mphd$$s1417444946_27072
000172875 3367_ $$02$$2EndNote$$aThesis
000172875 3367_ $$2DRIVER$$adoctoralThesis
000172875 3367_ $$2BibTeX$$aPHDTHESIS
000172875 3367_ $$2DataCite$$aOutput Types/Dissertation
000172875 3367_ $$2ORCID$$aDISSERTATION
000172875 502__ $$aUniversity of A Coruna, Diss., 2014$$bDr.$$cUniversity of A Coruna$$d2014
000172875 520__ $$aThe increasing number of cores per processor is turning multicore-based systems in pervasive. This involves dealing with multiple levels of memory in NUMA systems, accessible via complex interconnects in order to dispatch the increasing amount of data required. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one-sided communications becomes more important in these systems, to avoid synchronization between pairs of processes in collective operations implemented using two-sided point to point functions. This Thesis proposes a series of collective algorithms that provide a good performance and scalability. They use hierarchical trees, overlapping one-sided communications, message pipelining and NUMA binding. An implementation has been developed for UPC, a PGAS language whose performance has been also assessed in this Thesis. In order to assess the performance of these algorithms a new microbenchmarking tool has been designed and implemented. The performance evaluation of the algorithms, conducted on 6 representative systems, with 5 different processor architectures and 5 different interconnect technologies, has shown generally good performance and scalability, outperforming leading MPI algorithms in many cases, which confirms the suitability of the developed algorithms for multi- and manycore architectures.
000172875 536__ $$0G:(DE-HGF)POF2-899$$a899 - ohne Topic (POF2-899)$$cPOF2-899$$fPOF I$$x0
000172875 650_7 $$0V:(DE-588b)4012494-0$$2GND$$aDissertation$$xDiss.
000172875 773__ $$y2014
000172875 8564_ $$uhttp://ruc.udc.es/dspace/handle/2183/13755?locale=en
000172875 909CO $$ooai:juser.fz-juelich.de:172875$$pVDB
000172875 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)144660$$aForschungszentrum Jülich GmbH$$b0$$kFZJ
000172875 9132_ $$0G:(DE-HGF)POF3-899$$1G:(DE-HGF)POF3-890$$2G:(DE-HGF)POF3-800$$aDE-HGF$$bPOF III$$lForschungsbereich Materie$$vohne Programm$$x0
000172875 9131_ $$0G:(DE-HGF)POF2-899$$1G:(DE-HGF)POF2-890$$2G:(DE-HGF)POF2-800$$3G:(DE-HGF)POF2$$4G:(DE-HGF)POF$$aDE-HGF$$bProgrammungebundene Forschung$$lohne Programm$$vohne Topic$$x0
000172875 9141_ $$y2014
000172875 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000172875 980__ $$aphd
000172875 980__ $$aVDB
000172875 980__ $$aI:(DE-Juel1)JSC-20090406
000172875 980__ $$aUNRESTRICTED
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help