001     1026292
005     20240524205938.0
024 7 _ |a 10.34734/FZJ-2024-03363
|2 datacite_doi
037 _ _ |a FZJ-2024-03363
100 1 _ |a Zaourar, Lilia
|0 P:(DE-HGF)0
|b 0
|e Corresponding author
245 _ _ |a Case Studies on the Impact and Challenges of Heterogeneous NUMA Architectures for HPC
260 _ _ |c 2024
336 7 _ |a Preprint
|b preprint
|m preprint
|0 PUB:(DE-HGF)25
|s 1716539971_2344
|2 PUB:(DE-HGF)
336 7 _ |a WORKING_PAPER
|2 ORCID
336 7 _ |a Electronic Article
|0 28
|2 EndNote
336 7 _ |a preprint
|2 DRIVER
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a Output Types/Working Paper
|2 DataCite
520 _ _ |a The memory systems of High-Performance Computing (HPC) systems commonly feature non-uniform data paths to memory, i.e. are non-uniform memory access (NUMA) architectures. Memory is divided into multiple regions, with each processing unit having its own local memory. Therefore, for each processing unit access to local memory regions is faster compared to accessing memory at non-local regions. Architectures with hybrid memory technologies result in further non-uniformity. This paper presents case studies of the performance potential and data placement implications of non-uniform and heterogeneous memory in HPC systems. Using the gem5 and VPSim simulation platforms, we model NUMA systems with processors based on the ARMv8 Neoverse V1 Reference Design. The gem5 simulator provides a cycle-accurate view, while VPSim offers greater simulation speed, with a high-level view of the simulated system. We highlight the performance impact of design trade-offs regarding NUMA node organization and System Level Cache (SLC) group assignment, as well as Network-on-Chip (NoC) configuration. Our case studies provide essential input to a co-design process involving HPC processor architects and system integrators. A comparison of system configurations for different NoC bandwidths shows reduced NoC latency and high memory bandwidth improvement when NUMA control is enabled. Furthermore, a configuration with HBM2 memory organized as four NUMA nodes highlights the memory bandwidth performance gap and NoC queuing latency impact when comparing local vs. remote memory accesses. On the other hand, NUMA can result in an unbalanced distribution of memory accesses and reduced SLC hit ratios, as shown with DDR4 memory organized as four NUMA nodes.
536 _ _ |a 5122 - Future Computing & Big Data Systems (POF4-512)
|0 G:(DE-HGF)POF4-5122
|c POF4-512
|f POF IV
|x 0
536 _ _ |a EPI SGA2 (16ME0507K)
|0 G:(BMBF)16ME0507K
|c 16ME0507K
|x 1
700 1 _ |a Benazouz, Mohamed
|0 P:(DE-HGF)0
|b 1
700 1 _ |a Mouhagir, Ayoub
|0 P:(DE-HGF)0
|b 2
700 1 _ |a Falquez, Carlos
|0 P:(DE-Juel1)179531
|b 3
|u fzj
700 1 _ |a Portero, Antoni
|0 P:(DE-Juel1)177768
|b 4
|u fzj
700 1 _ |a Ho, Nam
|0 P:(DE-Juel1)176469
|b 5
|u fzj
700 1 _ |a Suarez, Estela
|0 P:(DE-Juel1)142361
|b 6
|u fzj
700 1 _ |a Petrakis, Polydoros
|0 P:(DE-HGF)0
|b 7
700 1 _ |a Marazakis, Manolis
|0 P:(DE-HGF)0
|b 8
700 1 _ |a Sgherzi, Francesco
|0 P:(DE-HGF)0
|b 9
700 1 _ |a Fernandez, Ivan
|0 P:(DE-HGF)0
|b 10
700 1 _ |a Dolbeau, Romain
|0 P:(DE-HGF)0
|b 11
700 1 _ |a Pleiter, Dirk
|0 P:(DE-HGF)0
|b 12
856 4 _ |y OpenAccess
|u https://juser.fz-juelich.de/record/1026292/files/arcs_2024_preprint.pdf
856 4 _ |y OpenAccess
|x icon
|u https://juser.fz-juelich.de/record/1026292/files/arcs_2024_preprint.gif?subformat=icon
856 4 _ |y OpenAccess
|x icon-1440
|u https://juser.fz-juelich.de/record/1026292/files/arcs_2024_preprint.jpg?subformat=icon-1440
856 4 _ |y OpenAccess
|x icon-180
|u https://juser.fz-juelich.de/record/1026292/files/arcs_2024_preprint.jpg?subformat=icon-180
856 4 _ |y OpenAccess
|x icon-640
|u https://juser.fz-juelich.de/record/1026292/files/arcs_2024_preprint.jpg?subformat=icon-640
909 C O |o oai:juser.fz-juelich.de:1026292
|p openaire
|p open_access
|p VDB
|p driver
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)179531
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 4
|6 P:(DE-Juel1)177768
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 5
|6 P:(DE-Juel1)176469
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 6
|6 P:(DE-Juel1)142361
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-512
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Supercomputing & Big Data Infrastructures
|9 G:(DE-HGF)POF4-5122
|x 0
914 1 _ |y 2024
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 1 _ |a FullTexts
980 _ _ |a preprint
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)JSC-20090406


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21