001     1037595
005     20250822121514.0
024 7 _ |a 10.1109/SCW63240.2024.00079
|2 doi
024 7 _ |a WOS:001451792300060
|2 WOS
037 _ _ |a FZJ-2025-00766
041 _ _ |a English
100 1 _ |a Schieffer, Gabin
|0 P:(DE-HGF)0
|b 0
|e Corresponding author
111 2 _ |a SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis
|g SC24
|c Atlanta, GA
|d 2024-11-17 - 2024-11-22
|w USA
245 _ _ |a Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric
260 _ _ |c 2024
|b IEEE
300 _ _ |a 567-576
336 7 _ |a CONFERENCE_PAPER
|2 ORCID
336 7 _ |a Conference Paper
|0 33
|2 EndNote
336 7 _ |a INPROCEEDINGS
|2 BibTeX
336 7 _ |a conferenceObject
|2 DRIVER
336 7 _ |a Output Types/Conference Paper
|2 DataCite
336 7 _ |a Contribution to a conference proceedings
|b contrib
|m contrib
|0 PUB:(DE-HGF)8
|s 1737372510_8877
|2 PUB:(DE-HGF)
520 _ _ |a Modern GPU systems are constantly evolving tomeet the needs of computing-intensive applications in scientificand machine learning domains. However, there is typically a gapbetween the hardware capacity and the achievable applicationperformance. This work aims to provide a better understandingof the Infinity Fabric interconnects on AMD GPUs and CPUs. Wepropose a test and evaluation methodology for characterizing theperformance of data movements on multi-GPU systems, stressingdifferent communication options on AMD MI250X GPUs, includ-ing point-to-point and collective communication, and memoryallocation strategies between GPUs, as well as the host CPU.In a single-node setup with four GPUs, we show that directpeer-to-peer memory accesses between GPUs and utilization ofthe RCCL library outperform MPI-based solutions in terms ofmemory/communication latency and bandwidth. Our test andevaluation method serves as a base for validating memory andcommunication strategies on a system and improving applicationson AMD multi-GPU computing systems.
536 _ _ |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)
|0 G:(DE-HGF)POF4-5112
|c POF4-511
|f POF IV
|x 0
536 _ _ |a ATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV)
|0 G:(DE-Juel-1)ATML-X-DEV
|c ATML-X-DEV
|x 1
588 _ _ |a Dataset connected to DataCite
700 1 _ |a Shi, Ruimin
|0 P:(DE-HGF)0
|b 1
700 1 _ |a Markidis, Stefano
|0 P:(DE-HGF)0
|b 2
700 1 _ |a Herten, Andreas
|0 P:(DE-Juel1)145478
|b 3
700 1 _ |a Faj, Jennifer
|0 P:(DE-HGF)0
|b 4
700 1 _ |a Peng, Ivy
|0 P:(DE-HGF)0
|b 5
770 _ _ |z 979-8-3503-5554-3
773 _ _ |a 10.1109/SCW63240.2024.00079
856 4 _ |u https://juser.fz-juelich.de/record/1037595/files/Understanding_Data_Movement_in_AMD_Multi-GPU_Systems_with_Infinity_Fabric.pdf
|y Restricted
909 C O |o oai:juser.fz-juelich.de:1037595
|p VDB
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)145478
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-511
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Enabling Computational- & Data-Intensive Science and Engineering
|9 G:(DE-HGF)POF4-5112
|x 0
914 1 _ |y 2024
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a contrib
980 _ _ |a VDB
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 _ _ |a UNRESTRICTED


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21