%0 Conference Paper
%A Nassyr, Stepan
%A Pleiter, Dirk
%T Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels
%V 14802
%C Cham
%I Springer Nature Switzerland
%M FZJ-2025-01495
%@ 978-3-031-69765-4 (print)
%B Lecture Notes in Computer Science
%P 47 - 61
%D 2024
%< Euro-Par 2024: Parallel Processing
%X Dense matrix-matrix operations are relevant for a broad range of numerical applications, e.g. for implementing deep neural networks. Past research has led to a good understanding of how these operations can be mapped in a generic manner on typical processor architectures with multiple cache levels such that near-optimal performance can be reached. However, while commonly used micro-architectures are typically suitable for such operations, their architectural parameters need to be suitably tuned. The performance of highly optimised implementations of these operations relies on micro-kernels that are often handwritten. Given the increased variety of instruction set architectures and SIMD instruction extensions, this becomes challenging. In this paper, wepresent and implement a methodology for an exhaustive exploration of a processor core micro-architecture design space based on gem5 simulations. Furthermore, we present a tool for generating efficiently vectorised code leveraging Arm’s SVE and RISC-V’s RVV instructions. It enables automatisation of the generation of micro-kernels and, therefore, the generation of a large range of such kernels. The results provide insights both, to micro-architecture architects as well as micro-kernel developers. The assembler generator is open-sourced and the simulation data is availableas supplementary material.
%B 30th European Conference on Parallel and Distributed Processing
%C 26 Aug 2024 - 30 Aug 2024, Madrid (Spain)
Y2 26 Aug 2024 - 30 Aug 2024
M2 Madrid, Spain
%F PUB:(DE-HGF)8 ; PUB:(DE-HGF)7
%9 Contribution to a conference proceedingsContribution to a book
%U <Go to ISI:>//WOS:001308370400004
%R 10.1007/978-3-031-69766-1_4
%U https://juser.fz-juelich.de/record/1038510