Design of scalable PGAS collectives for NUMA and manycore systems

Alvarez Mallon, Damian
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@PHDTHESIS{AlvarezMallon:172875,
      author       = {Alvarez Mallon, Damian},
      title        = {{D}esign of scalable {PGAS} collectives for {NUMA} and
                      manycore systems},
      school       = {University of A Coruna},
      type         = {Dr.},
      reportid     = {FZJ-2014-06309},
      pages        = {239 p.},
      year         = {2014},
      note         = {University of A Coruna, Diss., 2014},
      abstract     = {The increasing number of cores per processor is turning
                      multicore-based systems in pervasive. This involves dealing
                      with multiple levels of memory in NUMA systems, accessible
                      via complex interconnects in order to dispatch the
                      increasing amount of data required. The key for efficient
                      and scalable provision of data is the use of collective
                      communication operations that minimize the impact of
                      bottlenecks. Leveraging one-sided communications becomes
                      more important in these systems, to avoid synchronization
                      between pairs of processes in collective operations
                      implemented using two-sided point to point functions. This
                      Thesis proposes a series of collective algorithms that
                      provide a good performance and scalability. They use
                      hierarchical trees, overlapping one-sided communications,
                      message pipelining and NUMA binding. An implementation has
                      been developed for UPC, a PGAS language whose performance
                      has been also assessed in this Thesis. In order to assess
                      the performance of these algorithms a new microbenchmarking
                      tool has been designed and implemented. The performance
                      evaluation of the algorithms, conducted on 6 representative
                      systems, with 5 different processor architectures and 5
                      different interconnect technologies, has shown generally
                      good performance and scalability, outperforming leading MPI
                      algorithms in many cases, which confirms the suitability of
                      the developed algorithms for multi- and manycore
                      architectures.},
      keywords     = {Dissertation (GND)},
      cin          = {JSC},
      cid          = {I:(DE-Juel1)JSC-20090406},
      pnm          = {899 - ohne Topic (POF2-899)},
      pid          = {G:(DE-HGF)POF2-899},
      typ          = {PUB:(DE-HGF)11},
      url          = {https://juser.fz-juelich.de/record/172875},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help