Contribution to a conference proceedings FZJ-2025-00766

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric

 ;  ;  ;  ;  ;

2024
IEEE

SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis, SC24, Atlanta, GAAtlanta, GA, USA, 17 Nov 2024 - 22 Nov 20242024-11-172024-11-22 IEEE 567-576 () [10.1109/SCW63240.2024.00079]

This record in other databases:  

Please use a persistent id in citations: doi:

Abstract: Modern GPU systems are constantly evolving tomeet the needs of computing-intensive applications in scientificand machine learning domains. However, there is typically a gapbetween the hardware capacity and the achievable applicationperformance. This work aims to provide a better understandingof the Infinity Fabric interconnects on AMD GPUs and CPUs. Wepropose a test and evaluation methodology for characterizing theperformance of data movements on multi-GPU systems, stressingdifferent communication options on AMD MI250X GPUs, includ-ing point-to-point and collective communication, and memoryallocation strategies between GPUs, as well as the host CPU.In a single-node setup with four GPUs, we show that directpeer-to-peer memory accesses between GPUs and utilization ofthe RCCL library outperform MPI-based solutions in terms ofmemory/communication latency and bandwidth. Our test andevaluation method serves as a base for validating memory andcommunication strategies on a system and improving applicationson AMD multi-GPU computing systems.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)
  2. ATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV) (ATML-X-DEV)

Appears in the scientific report 2024
Click to display QR Code for this record

The record appears in these collections:
Dokumenttypen > Ereignisse > Beiträge zu Proceedings
Workflowsammlungen > Öffentliche Einträge
Institutssammlungen > JSC
Publikationsdatenbank

 Datensatz erzeugt am 2025-01-20, letzte Änderung am 2025-08-22


Restricted:
Volltext herunterladen PDF
Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)