TY - CONF
AU - Nassyr, Stepan
AU - Pleiter, Dirk
TI - Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels
VL - 14802
CY - Cham
PB - Springer Nature Switzerland
M1 - FZJ-2025-01495
SN - 978-3-031-69765-4 (print)
T2 - Lecture Notes in Computer Science
SP - 47 - 61
PY - 2024
AB - Dense matrix-matrix operations are relevant for a broad range of numerical applications, e.g. for implementing deep neural networks. Past research has led to a good understanding of how these operations can be mapped in a generic manner on typical processor architectures with multiple cache levels such that near-optimal performance can be reached. However, while commonly used micro-architectures are typically suitable for such operations, their architectural parameters need to be suitably tuned. The performance of highly optimised implementations of these operations relies on micro-kernels that are often handwritten. Given the increased variety of instruction set architectures and SIMD instruction extensions, this becomes challenging. In this paper, wepresent and implement a methodology for an exhaustive exploration of a processor core micro-architecture design space based on gem5 simulations. Furthermore, we present a tool for generating efficiently vectorised code leveraging Arm’s SVE and RISC-V’s RVV instructions. It enables automatisation of the generation of micro-kernels and, therefore, the generation of a large range of such kernels. The results provide insights both, to micro-architecture architects as well as micro-kernel developers. The assembler generator is open-sourced and the simulation data is availableas supplementary material.
T2 - 30th European Conference on Parallel and Distributed Processing
CY - 26 Aug 2024 - 30 Aug 2024, Madrid (Spain)
Y2 - 26 Aug 2024 - 30 Aug 2024
M2 - Madrid, Spain
LB - PUB:(DE-HGF)8 ; PUB:(DE-HGF)7
UR - <Go to ISI:>//WOS:001308370400004
DO - DOI:10.1007/978-3-031-69766-1_4
UR - https://juser.fz-juelich.de/record/1038510
ER -