Contribution to a conference proceedings/Contribution to a book FZJ-2025-01495

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels

 ;

2024
Springer Nature Switzerland Cham
ISBN: 978-3-031-69765-4 (print), 978-3-031-69766-1 (electronic)

Euro-Par 2024: Parallel Processing
30th European Conference on Parallel and Distributed Processing, Euro-Par 2024, MadridMadrid, Spain, 26 Aug 2024 - 30 Aug 20242024-08-262024-08-30
Cham : Springer Nature Switzerland, Lecture Notes in Computer Science 14802, 47 - 61 () [10.1007/978-3-031-69766-1_4]

This record in other databases:  

Please use a persistent id in citations: doi:

Abstract: Dense matrix-matrix operations are relevant for a broad range of numerical applications, e.g. for implementing deep neural networks. Past research has led to a good understanding of how these operations can be mapped in a generic manner on typical processor architectures with multiple cache levels such that near-optimal performance can be reached. However, while commonly used micro-architectures are typically suitable for such operations, their architectural parameters need to be suitably tuned. The performance of highly optimised implementations of these operations relies on micro-kernels that are often handwritten. Given the increased variety of instruction set architectures and SIMD instruction extensions, this becomes challenging. In this paper, wepresent and implement a methodology for an exhaustive exploration of a processor core micro-architecture design space based on gem5 simulations. Furthermore, we present a tool for generating efficiently vectorised code leveraging Arm’s SVE and RISC-V’s RVV instructions. It enables automatisation of the generation of micro-kernels and, therefore, the generation of a large range of such kernels. The results provide insights both, to micro-architecture architects as well as micro-kernel developers. The assembler generator is open-sourced and the simulation data is availableas supplementary material.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)
  2. 5122 - Future Computing & Big Data Systems (POF4-512) (POF4-512)
  3. PhD no Grant - Doktorand ohne besondere Förderung (PHD-NO-GRANT-20170405) (PHD-NO-GRANT-20170405)
  4. EPI SGA2 (16ME0507K) (16ME0507K)

Appears in the scientific report 2024
Database coverage:
NationallizenzNationallizenz ; SCOPUS
Click to display QR Code for this record

The record appears in these collections:
Dokumenttypen > Ereignisse > Beiträge zu Proceedings
Dokumenttypen > Bücher > Buchbeitrag
Workflowsammlungen > Öffentliche Einträge
Institutssammlungen > JSC
Publikationsdatenbank

 Datensatz erzeugt am 2025-01-30, letzte Änderung am 2025-02-03


Restricted:
Preprint - Volltext herunterladen PDF
978-3-031-69766-1 - Volltext herunterladen PDF
Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)