Home > Publications database > Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels |
Contribution to a conference proceedings/Contribution to a book | FZJ-2025-01495 |
;
2024
Springer Nature Switzerland
Cham
ISBN: 978-3-031-69765-4 (print), 978-3-031-69766-1 (electronic)
This record in other databases:
Please use a persistent id in citations: doi:10.1007/978-3-031-69766-1_4
Abstract: Dense matrix-matrix operations are relevant for a broad range of numerical applications, e.g. for implementing deep neural networks. Past research has led to a good understanding of how these operations can be mapped in a generic manner on typical processor architectures with multiple cache levels such that near-optimal performance can be reached. However, while commonly used micro-architectures are typically suitable for such operations, their architectural parameters need to be suitably tuned. The performance of highly optimised implementations of these operations relies on micro-kernels that are often handwritten. Given the increased variety of instruction set architectures and SIMD instruction extensions, this becomes challenging. In this paper, wepresent and implement a methodology for an exhaustive exploration of a processor core micro-architecture design space based on gem5 simulations. Furthermore, we present a tool for generating efficiently vectorised code leveraging Arm’s SVE and RISC-V’s RVV instructions. It enables automatisation of the generation of micro-kernels and, therefore, the generation of a large range of such kernels. The results provide insights both, to micro-architecture architects as well as micro-kernel developers. The assembler generator is open-sourced and the simulation data is availableas supplementary material.
![]() |
The record appears in these collections: |