Design of scalable PGAS collectives for NUMA and manycore systems

Alvarez Mallon, Damian

Items
Marc 21

001			172875
005			20210129214525.0
037	_	_	\|a FZJ-2014-06309
100	1	_	\|a Alvarez Mallon, Damian \|b 0 \|e Corresponding Author \|g male \|0 P:(DE-Juel1)144660 \|u fzj
245	_	_	\|a Design of scalable PGAS collectives for NUMA and manycore systems \|f 2014-10-27
260	_	_	\|c 2014
300	_	_	\|a 239 p.
336	7	_	\|a Dissertation / PhD Thesis \|b phd \|m phd \|0 PUB:(DE-HGF)11 \|s 1417444946_27072 \|2 PUB:(DE-HGF)
336	7	_	\|a Thesis \|0 2 \|2 EndNote
336	7	_	\|a doctoralThesis \|2 DRIVER
336	7	_	\|a PHDTHESIS \|2 BibTeX
336	7	_	\|a Output Types/Dissertation \|2 DataCite
336	7	_	\|a DISSERTATION \|2 ORCID
502	_	_	\|a University of A Coruna, Diss., 2014 \|c University of A Coruna \|b Dr. \|d 2014
520	_	_	\|a The increasing number of cores per processor is turning multicore-based systems in pervasive. This involves dealing with multiple levels of memory in NUMA systems, accessible via complex interconnects in order to dispatch the increasing amount of data required. The key for efficient and scalable provision of data is the use of collective communication operations that minimize the impact of bottlenecks. Leveraging one-sided communications becomes more important in these systems, to avoid synchronization between pairs of processes in collective operations implemented using two-sided point to point functions. This Thesis proposes a series of collective algorithms that provide a good performance and scalability. They use hierarchical trees, overlapping one-sided communications, message pipelining and NUMA binding. An implementation has been developed for UPC, a PGAS language whose performance has been also assessed in this Thesis. In order to assess the performance of these algorithms a new microbenchmarking tool has been designed and implemented. The performance evaluation of the algorithms, conducted on 6 representative systems, with 5 different processor architectures and 5 different interconnect technologies, has shown generally good performance and scalability, outperforming leading MPI algorithms in many cases, which confirms the suitability of the developed algorithms for multi- and manycore architectures.
536	_	_	\|a 899 - ohne Topic (POF2-899) \|0 G:(DE-HGF)POF2-899 \|c POF2-899 \|x 0 \|f POF I
650	_	7	\|a Dissertation \|0 V:(DE-588b)4012494-0 \|2 GND \|x Diss.
773	_	_	\|y 2014
856	4	_	\|u http://ruc.udc.es/dspace/handle/2183/13755?locale=en
909	C	O	\|o oai:juser.fz-juelich.de:172875 \|p VDB
910	1	_	\|a Forschungszentrum Jülich GmbH \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)144660
913	2	_	\|a DE-HGF \|b POF III \|l Forschungsbereich Materie \|1 G:(DE-HGF)POF3-890 \|0 G:(DE-HGF)POF3-899 \|2 G:(DE-HGF)POF3-800 \|v ohne Programm \|x 0
913	1	_	\|a DE-HGF \|b Programmungebundene Forschung \|l ohne Programm \|1 G:(DE-HGF)POF2-890 \|0 G:(DE-HGF)POF2-899 \|2 G:(DE-HGF)POF2-800 \|v ohne Topic \|x 0 \|4 G:(DE-HGF)POF \|3 G:(DE-HGF)POF2
914	1	_	\|y 2014
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a phd
980	_	_	\|a VDB
980	_	_	\|a I:(DE-Juel1)JSC-20090406
980	_	_	\|a UNRESTRICTED

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe