Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric

Schieffer, Gabin; Shi, Ruimin; Peng, Ivy; Faj, Jennifer; Herten, Andreas; Markidis, Stefano
doi:10.1109/SCW63240.2024.00079
001037595 001__ 1037595
001037595 005__ 20250822121514.0
001037595 0247_ $$2doi$$a10.1109/SCW63240.2024.00079
001037595 0247_ $$2WOS$$aWOS:001451792300060
001037595 037__ $$aFZJ-2025-00766
001037595 041__ $$aEnglish
001037595 1001_ $$0P:(DE-HGF)0$$aSchieffer, Gabin$$b0$$eCorresponding author
001037595 1112_ $$aSC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis$$cAtlanta, GA$$d2024-11-17 - 2024-11-22$$gSC24$$wUSA
001037595 245__ $$aUnderstanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric
001037595 260__ $$bIEEE$$c2024
001037595 300__ $$a567-576
001037595 3367_ $$2ORCID$$aCONFERENCE_PAPER
001037595 3367_ $$033$$2EndNote$$aConference Paper
001037595 3367_ $$2BibTeX$$aINPROCEEDINGS
001037595 3367_ $$2DRIVER$$aconferenceObject
001037595 3367_ $$2DataCite$$aOutput Types/Conference Paper
001037595 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1737372510_8877
001037595 520__ $$aModern GPU systems are constantly evolving tomeet the needs of computing-intensive applications in scientificand machine learning domains. However, there is typically a gapbetween the hardware capacity and the achievable applicationperformance. This work aims to provide a better understandingof the Infinity Fabric interconnects on AMD GPUs and CPUs. Wepropose a test and evaluation methodology for characterizing theperformance of data movements on multi-GPU systems, stressingdifferent communication options on AMD MI250X GPUs, includ-ing point-to-point and collective communication, and memoryallocation strategies between GPUs, as well as the host CPU.In a single-node setup with four GPUs, we show that directpeer-to-peer memory accesses between GPUs and utilization ofthe RCCL library outperform MPI-based solutions in terms ofmemory/communication latency and bandwidth. Our test andevaluation method serves as a base for validating memory andcommunication strategies on a system and improving applicationson AMD multi-GPU computing systems.
001037595 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001037595 536__ $$0G:(DE-Juel-1)ATML-X-DEV$$aATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV)$$cATML-X-DEV$$x1
001037595 588__ $$aDataset connected to DataCite
001037595 7001_ $$0P:(DE-HGF)0$$aShi, Ruimin$$b1
001037595 7001_ $$0P:(DE-HGF)0$$aMarkidis, Stefano$$b2
001037595 7001_ $$0P:(DE-Juel1)145478$$aHerten, Andreas$$b3
001037595 7001_ $$0P:(DE-HGF)0$$aFaj, Jennifer$$b4
001037595 7001_ $$0P:(DE-HGF)0$$aPeng, Ivy$$b5
001037595 770__ $$z979-8-3503-5554-3
001037595 773__ $$a10.1109/SCW63240.2024.00079
001037595 8564_ $$uhttps://juser.fz-juelich.de/record/1037595/files/Understanding_Data_Movement_in_AMD_Multi-GPU_Systems_with_Infinity_Fabric.pdf$$yRestricted
001037595 909CO $$ooai:juser.fz-juelich.de:1037595$$pVDB
001037595 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)145478$$aForschungszentrum Jülich$$b3$$kFZJ
001037595 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001037595 9141_ $$y2024
001037595 920__ $$lyes
001037595 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001037595 980__ $$acontrib
001037595 980__ $$aVDB
001037595 980__ $$aI:(DE-Juel1)JSC-20090406
001037595 980__ $$aUNRESTRICTED
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe