001     1042334
005     20250804115218.0
024 7 _ |a 10.1109/ACCESS.2025.3569533
|2 doi
024 7 _ |a 10.34734/FZJ-2025-02537
|2 datacite_doi
024 7 _ |a WOS:001492121500023
|2 WOS
037 _ _ |a FZJ-2025-02537
082 _ _ |a 621.3
100 1 _ |a Ho, Nam
|0 P:(DE-Juel1)176469
|b 0
|e Corresponding author
|u fzj
245 _ _ |a Memory Prefetching Evaluation of Scientific Applications on A Modern HPC Arm-based Processor
260 _ _ |a New York, NY
|c 2025
|b IEEE
336 7 _ |a article
|2 DRIVER
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1753033141_20204
|2 PUB:(DE-HGF)
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a Journal Article
|0 0
|2 EndNote
520 _ _ |a Memory prefetching is a well-known technique for mitigating the negative impact of memory access latencies on memory bandwidth. This problem has become more pressing as improvements in memory bandwidth have not kept pace with increases in computational power. While much existing work has been devoted to finding appropriate prefetching techniques for specific workloads, few provide insight into the behavior of scientific applications to better understand the impact of prefetchers. This paper investigates the impact of hardware prefetchers on the latest Arm-based high-end processor architectures. In this work, we investigate memory access patterns by analyzing locality properties and visualizing delta and repetitive address patterns. A deeper understanding of memory access patterns allows the use of the appropriate prefetcher and reaching a better correlation between access pattern properties and prefetcher performance. This can guide future co-design efforts. We evaluated traditional and innovative prefetchers using a gem5-based model of Arm Neoverse V1 cores. The model features a 16-core architecture, using Amazon’s Graviton 3 processor as a hardware reference, but substituting DDR5 by high bandwidth memory (HBM2). We performed a detailed prefetching evaluation focusing on stencil, sparse matrix-vector multiplication, and Breadth-First Search kernels. These kernels represent a broad range of the applications running on today’s High-Performance Computing (HPC) systems, which are sensitive to memory performance.
536 _ _ |a 5122 - Future Computing & Big Data Systems (POF4-512)
|0 G:(DE-HGF)POF4-5122
|c POF4-512
|f POF IV
|x 0
536 _ _ |a EPI SGA2 (16ME0507K)
|0 G:(BMBF)16ME0507K
|c 16ME0507K
|x 1
536 _ _ |a EPI SGA1 - SGA1 (Specific Grant Agreement 1) OF THE EUROPEAN PROCESSOR INITIATIVE (EPI) (826647)
|0 G:(EU-Grant)826647
|c 826647
|f H2020-SGA-LPMT-2018
|x 2
588 _ _ |a Dataset connected to DataCite
700 1 _ |a FALQUEZ, CARLOS
|0 P:(DE-Juel1)179531
|b 1
|u fzj
700 1 _ |a PORTERO, ANTONI
|0 P:(DE-Juel1)177768
|b 2
700 1 _ |a SUAREZ, ESTELA
|0 P:(DE-Juel1)142361
|b 3
|u fzj
700 1 _ |a PLEITER, DIRK
|0 P:(DE-Juel1)144441
|b 4
773 _ _ |a 10.1109/ACCESS.2025.3569533
|0 PERI:(DE-600)2687964-5
|p 85898 - 85926
|t IEEE access
|v 13
|y 2025
|x 2169-3536
856 4 _ |u https://juser.fz-juelich.de/record/1042334/files/APC600663786.pdf
856 4 _ |y OpenAccess
|u https://juser.fz-juelich.de/record/1042334/files/Memory_Prefetching_Evaluation_of_Scientific_Applications_on_a_Modern_HPC_Arm-Based_Processor.pdf
909 C O |o oai:juser.fz-juelich.de:1042334
|p openaire
|p open_access
|p OpenAPC
|p driver
|p VDB
|p ec_fundedresources
|p openCost
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)176469
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 1
|6 P:(DE-Juel1)179531
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)142361
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-512
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Supercomputing & Big Data Infrastructures
|9 G:(DE-HGF)POF4-5122
|x 0
914 1 _ |y 2025
915 p c |a APC keys set
|0 PC:(DE-HGF)0000
|2 APC
915 p c |a DOAJ Journal
|0 PC:(DE-HGF)0003
|2 APC
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0160
|2 StatID
|b Essential Science Indicators
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1160
|2 StatID
|b Current Contents - Engineering, Computing and Technology
|d 2025-01-02
915 _ _ |a Creative Commons Attribution CC BY 4.0
|0 LIC:(DE-HGF)CCBY4
|2 HGFVOC
915 _ _ |a JCR
|0 StatID:(DE-HGF)0100
|2 StatID
|b IEEE ACCESS : 2022
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0501
|2 StatID
|b DOAJ Seal
|d 2024-04-03T10:39:05Z
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0500
|2 StatID
|b DOAJ
|d 2024-04-03T10:39:05Z
915 _ _ |a WoS
|0 StatID:(DE-HGF)0113
|2 StatID
|b Science Citation Index Expanded
|d 2025-01-02
915 _ _ |a Fees
|0 StatID:(DE-HGF)0700
|2 StatID
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0150
|2 StatID
|b Web of Science Core Collection
|d 2025-01-02
915 _ _ |a IF < 5
|0 StatID:(DE-HGF)9900
|2 StatID
|d 2025-01-02
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
915 _ _ |a Peer Review
|0 StatID:(DE-HGF)0030
|2 StatID
|b DOAJ : Anonymous peer review
|d 2024-04-03T10:39:05Z
915 _ _ |a Article Processing Charges
|0 StatID:(DE-HGF)0561
|2 StatID
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1230
|2 StatID
|b Current Contents - Electronics and Telecommunications Collection
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0300
|2 StatID
|b Medline
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0199
|2 StatID
|b Clarivate Analytics Master Journal List
|d 2025-01-02
920 _ _ |l yes
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 _ _ |a APC
980 1 _ |a APC
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21