Preprint FZJ-2024-03363

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC

 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;

2024

This record in other databases:

Please use a persistent id in citations: doi:

Abstract: The memory systems of High-Performance Computing (HPC) systems commonly feature non-uniform data paths to memory, i.e. are non-uniform memory access (NUMA) architectures. Memory is divided into multiple regions, with each processing unit having its own local memory. Therefore, for each processing unit access to local memory regions is faster compared to accessing memory at non-local regions. Architectures with hybrid memory technologies result in further non-uniformity. This paper presents case studies of the performance potential and data placement implications of non-uniform and heterogeneous memory in HPC systems. Using the gem5 and VPSim simulation platforms, we model NUMA systems with processors based on the ARMv8 Neoverse V1 Reference Design. The gem5 simulator provides a cycle-accurate view, while VPSim offers greater simulation speed, with a high-level view of the simulated system. We highlight the performance impact of design trade-offs regarding NUMA node organization and System Level Cache (SLC) group assignment, as well as Network-on-Chip (NoC) configuration. Our case studies provide essential input to a co-design process involving HPC processor architects and system integrators. A comparison of system configurations for different NoC bandwidths shows reduced NoC latency and high memory bandwidth improvement when NUMA control is enabled. Furthermore, a configuration with HBM2 memory organized as four NUMA nodes highlights the memory bandwidth performance gap and NoC queuing latency impact when comparing local vs. remote memory accesses. On the other hand, NUMA can result in an unbalanced distribution of memory accesses and reduced SLC hit ratios, as shown with DDR4 memory organized as four NUMA nodes.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5122 - Future Computing & Big Data Systems (POF4-512) (POF4-512)
  2. EPI SGA2 (16ME0507K) (16ME0507K)

Appears in the scientific report 2024
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Document types > Reports > Preprints
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access

 Record created 2024-05-14, last modified 2024-05-24


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)