Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric

Schieffer, Gabin; Shi, Ruimin; Peng, Ivy; Faj, Jennifer; Herten, Andreas; Markidis, Stefano

doi:10.1109/SCW63240.2024.00079

Items
Marc 21

001			1037595
005			20250822121514.0
024	7	_	\|a 10.1109/SCW63240.2024.00079 \|2 doi
024	7	_	\|a WOS:001451792300060 \|2 WOS
037	_	_	\|a FZJ-2025-00766
041	_	_	\|a English
100	1	_	\|a Schieffer, Gabin \|0 P:(DE-HGF)0 \|b 0 \|e Corresponding author
111	2	_	\|a SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis \|g SC24 \|c Atlanta, GA \|d 2024-11-17 - 2024-11-22 \|w USA
245	_	_	\|a Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric
260	_	_	\|c 2024 \|b IEEE
300	_	_	\|a 567-576
336	7	_	\|a CONFERENCE_PAPER \|2 ORCID
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a Output Types/Conference Paper \|2 DataCite
336	7	_	\|a Contribution to a conference proceedings \|b contrib \|m contrib \|0 PUB:(DE-HGF)8 \|s 1737372510_8877 \|2 PUB:(DE-HGF)
520	_	_	\|a Modern GPU systems are constantly evolving tomeet the needs of computing-intensive applications in scientificand machine learning domains. However, there is typically a gapbetween the hardware capacity and the achievable applicationperformance. This work aims to provide a better understandingof the Infinity Fabric interconnects on AMD GPUs and CPUs. Wepropose a test and evaluation methodology for characterizing theperformance of data movements on multi-GPU systems, stressingdifferent communication options on AMD MI250X GPUs, includ-ing point-to-point and collective communication, and memoryallocation strategies between GPUs, as well as the host CPU.In a single-node setup with four GPUs, we show that directpeer-to-peer memory accesses between GPUs and utilization ofthe RCCL library outperform MPI-based solutions in terms ofmemory/communication latency and bandwidth. Our test andevaluation method serves as a base for validating memory andcommunication strategies on a system and improving applicationson AMD multi-GPU computing systems.
536	_	_	\|a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) \|0 G:(DE-HGF)POF4-5112 \|c POF4-511 \|f POF IV \|x 0
536	_	_	\|a ATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV) \|0 G:(DE-Juel-1)ATML-X-DEV \|c ATML-X-DEV \|x 1
588	_	_	\|a Dataset connected to DataCite
700	1	_	\|a Shi, Ruimin \|0 P:(DE-HGF)0 \|b 1
700	1	_	\|a Markidis, Stefano \|0 P:(DE-HGF)0 \|b 2
700	1	_	\|a Herten, Andreas \|0 P:(DE-Juel1)145478 \|b 3
700	1	_	\|a Faj, Jennifer \|0 P:(DE-HGF)0 \|b 4
700	1	_	\|a Peng, Ivy \|0 P:(DE-HGF)0 \|b 5
770	_	_	\|z 979-8-3503-5554-3
773	_	_	\|a 10.1109/SCW63240.2024.00079
856	4	_	\|u https://juser.fz-juelich.de/record/1037595/files/Understanding_Data_Movement_in_AMD_Multi-GPU_Systems_with_Infinity_Fabric.pdf \|y Restricted
909	C	O	\|o oai:juser.fz-juelich.de:1037595 \|p VDB
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 3 \|6 P:(DE-Juel1)145478
913	1	_	\|a DE-HGF \|b Key Technologies \|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action \|1 G:(DE-HGF)POF4-510 \|0 G:(DE-HGF)POF4-511 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-500 \|4 G:(DE-HGF)POF \|v Enabling Computational- & Data-Intensive Science and Engineering \|9 G:(DE-HGF)POF4-5112 \|x 0
914	1	_	\|y 2024
920	_	_	\|l yes
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a contrib
980	_	_	\|a VDB
980	_	_	\|a I:(DE-Juel1)JSC-20090406
980	_	_	\|a UNRESTRICTED

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe