| Hauptseite > Publikationsdatenbank > Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric > print |
| 001 | 1037595 | ||
| 005 | 20250822121514.0 | ||
| 024 | 7 | _ | |a 10.1109/SCW63240.2024.00079 |2 doi |
| 024 | 7 | _ | |a WOS:001451792300060 |2 WOS |
| 037 | _ | _ | |a FZJ-2025-00766 |
| 041 | _ | _ | |a English |
| 100 | 1 | _ | |a Schieffer, Gabin |0 P:(DE-HGF)0 |b 0 |e Corresponding author |
| 111 | 2 | _ | |a SC24-W: Workshops of the International Conference for High Performance Computing, Networking, Storage and Analysis |g SC24 |c Atlanta, GA |d 2024-11-17 - 2024-11-22 |w USA |
| 245 | _ | _ | |a Understanding Data Movement in AMD Multi-GPU Systems with Infinity Fabric |
| 260 | _ | _ | |c 2024 |b IEEE |
| 300 | _ | _ | |a 567-576 |
| 336 | 7 | _ | |a CONFERENCE_PAPER |2 ORCID |
| 336 | 7 | _ | |a Conference Paper |0 33 |2 EndNote |
| 336 | 7 | _ | |a INPROCEEDINGS |2 BibTeX |
| 336 | 7 | _ | |a conferenceObject |2 DRIVER |
| 336 | 7 | _ | |a Output Types/Conference Paper |2 DataCite |
| 336 | 7 | _ | |a Contribution to a conference proceedings |b contrib |m contrib |0 PUB:(DE-HGF)8 |s 1737372510_8877 |2 PUB:(DE-HGF) |
| 520 | _ | _ | |a Modern GPU systems are constantly evolving tomeet the needs of computing-intensive applications in scientificand machine learning domains. However, there is typically a gapbetween the hardware capacity and the achievable applicationperformance. This work aims to provide a better understandingof the Infinity Fabric interconnects on AMD GPUs and CPUs. Wepropose a test and evaluation methodology for characterizing theperformance of data movements on multi-GPU systems, stressingdifferent communication options on AMD MI250X GPUs, includ-ing point-to-point and collective communication, and memoryallocation strategies between GPUs, as well as the host CPU.In a single-node setup with four GPUs, we show that directpeer-to-peer memory accesses between GPUs and utilization ofthe RCCL library outperform MPI-based solutions in terms ofmemory/communication latency and bandwidth. Our test andevaluation method serves as a base for validating memory andcommunication strategies on a system and improving applicationson AMD multi-GPU computing systems. |
| 536 | _ | _ | |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) |0 G:(DE-HGF)POF4-5112 |c POF4-511 |f POF IV |x 0 |
| 536 | _ | _ | |a ATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV) |0 G:(DE-Juel-1)ATML-X-DEV |c ATML-X-DEV |x 1 |
| 588 | _ | _ | |a Dataset connected to DataCite |
| 700 | 1 | _ | |a Shi, Ruimin |0 P:(DE-HGF)0 |b 1 |
| 700 | 1 | _ | |a Markidis, Stefano |0 P:(DE-HGF)0 |b 2 |
| 700 | 1 | _ | |a Herten, Andreas |0 P:(DE-Juel1)145478 |b 3 |
| 700 | 1 | _ | |a Faj, Jennifer |0 P:(DE-HGF)0 |b 4 |
| 700 | 1 | _ | |a Peng, Ivy |0 P:(DE-HGF)0 |b 5 |
| 770 | _ | _ | |z 979-8-3503-5554-3 |
| 773 | _ | _ | |a 10.1109/SCW63240.2024.00079 |
| 856 | 4 | _ | |u https://juser.fz-juelich.de/record/1037595/files/Understanding_Data_Movement_in_AMD_Multi-GPU_Systems_with_Infinity_Fabric.pdf |y Restricted |
| 909 | C | O | |o oai:juser.fz-juelich.de:1037595 |p VDB |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 3 |6 P:(DE-Juel1)145478 |
| 913 | 1 | _ | |a DE-HGF |b Key Technologies |l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action |1 G:(DE-HGF)POF4-510 |0 G:(DE-HGF)POF4-511 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Enabling Computational- & Data-Intensive Science and Engineering |9 G:(DE-HGF)POF4-5112 |x 0 |
| 914 | 1 | _ | |y 2024 |
| 920 | _ | _ | |l yes |
| 920 | 1 | _ | |0 I:(DE-Juel1)JSC-20090406 |k JSC |l Jülich Supercomputing Center |x 0 |
| 980 | _ | _ | |a contrib |
| 980 | _ | _ | |a VDB |
| 980 | _ | _ | |a I:(DE-Juel1)JSC-20090406 |
| 980 | _ | _ | |a UNRESTRICTED |
| Library | Collection | CLSMajor | CLSMinor | Language | Author |
|---|