Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels

Nassyr, Stepan; Pleiter, Dirk

doi:10.1007/978-3-031-69766-1_4

Contribution to a conference proceedings/Contribution to a book

FZJ-2025-01495

Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels

Nassyr, S.FZJ* ; Pleiter, D. (Corresponding author)

2024
Springer Nature Switzerland Cham
ISBN: 978-3-031-69765-4 (print), 978-3-031-69766-1 (electronic)

Euro-Par 2024: Parallel Processing
30th European Conference on Parallel and Distributed Processing, Euro-Par 2024, Madrid, Spain, 26 Aug 2024 - 30 Aug 2024 Cham : Springer Nature Switzerland, Lecture Notes in Computer Science 14802, 47 - 61 (2024) [10.1007/978-3-031-69766-1_4]

This record in other databases:

Please use a persistent id in citations: doi:10.1007/978-3-031-69766-1_4

Abstract: Dense matrix-matrix operations are relevant for a broad range of numerical applications, e.g. for implementing deep neural networks. Past research has led to a good understanding of how these operations can be mapped in a generic manner on typical processor architectures with multiple cache levels such that near-optimal performance can be reached. However, while commonly used micro-architectures are typically suitable for such operations, their architectural parameters need to be suitably tuned. The performance of highly optimised implementations of these operations relies on micro-kernels that are often handwritten. Given the increased variety of instruction set architectures and SIMD instruction extensions, this becomes challenging. In this paper, wepresent and implement a methodology for an exhaustive exploration of a processor core micro-architecture design space based on gem5 simulations. Furthermore, we present a tool for generating efficiently vectorised code leveraging Arm’s SVE and RISC-V’s RVV instructions. It enables automatisation of the generation of micro-kernels and, therefore, the generation of a large range of such kernels. The results provide insights both, to micro-architecture architects as well as micro-kernel developers. The assembler generator is open-sourced and the simulation data is availableas supplementary material.

Contributing Institute(s):