Hauptseite > Publikationsdatenbank > Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels > print |
001 | 1038510 | ||
005 | 20250203124521.0 | ||
020 | _ | _ | |a 978-3-031-69765-4 (print) |
020 | _ | _ | |a 978-3-031-69766-1 (electronic) |
024 | 7 | _ | |a 10.1007/978-3-031-69766-1_4 |2 doi |
024 | 7 | _ | |a 0302-9743 |2 ISSN |
024 | 7 | _ | |a 1611-3349 |2 ISSN |
024 | 7 | _ | |a WOS:001308370400004 |2 WOS |
037 | _ | _ | |a FZJ-2025-01495 |
100 | 1 | _ | |a Nassyr, Stepan |0 P:(DE-Juel1)172888 |b 0 |u fzj |
111 | 2 | _ | |a 30th European Conference on Parallel and Distributed Processing |g Euro-Par 2024 |c Madrid |d 2024-08-26 - 2024-08-30 |w Spain |
245 | _ | _ | |a Exploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels |
260 | _ | _ | |a Cham |c 2024 |b Springer Nature Switzerland |
295 | 1 | 0 | |a Euro-Par 2024: Parallel Processing |
300 | _ | _ | |a 47 - 61 |
336 | 7 | _ | |a CONFERENCE_PAPER |2 ORCID |
336 | 7 | _ | |a Conference Paper |0 33 |2 EndNote |
336 | 7 | _ | |a INPROCEEDINGS |2 BibTeX |
336 | 7 | _ | |a conferenceObject |2 DRIVER |
336 | 7 | _ | |a Output Types/Conference Paper |2 DataCite |
336 | 7 | _ | |a Contribution to a conference proceedings |b contrib |m contrib |0 PUB:(DE-HGF)8 |s 1738307726_672 |2 PUB:(DE-HGF) |
336 | 7 | _ | |a Contribution to a book |0 PUB:(DE-HGF)7 |2 PUB:(DE-HGF) |m contb |
490 | 0 | _ | |a Lecture Notes in Computer Science |v 14802 |
520 | _ | _ | |a Dense matrix-matrix operations are relevant for a broad range of numerical applications, e.g. for implementing deep neural networks. Past research has led to a good understanding of how these operations can be mapped in a generic manner on typical processor architectures with multiple cache levels such that near-optimal performance can be reached. However, while commonly used micro-architectures are typically suitable for such operations, their architectural parameters need to be suitably tuned. The performance of highly optimised implementations of these operations relies on micro-kernels that are often handwritten. Given the increased variety of instruction set architectures and SIMD instruction extensions, this becomes challenging. In this paper, wepresent and implement a methodology for an exhaustive exploration of a processor core micro-architecture design space based on gem5 simulations. Furthermore, we present a tool for generating efficiently vectorised code leveraging Arm’s SVE and RISC-V’s RVV instructions. It enables automatisation of the generation of micro-kernels and, therefore, the generation of a large range of such kernels. The results provide insights both, to micro-architecture architects as well as micro-kernel developers. The assembler generator is open-sourced and the simulation data is availableas supplementary material. |
536 | _ | _ | |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) |0 G:(DE-HGF)POF4-5112 |c POF4-511 |f POF IV |x 0 |
536 | _ | _ | |a 5122 - Future Computing & Big Data Systems (POF4-512) |0 G:(DE-HGF)POF4-5122 |c POF4-512 |f POF IV |x 1 |
536 | _ | _ | |a PhD no Grant - Doktorand ohne besondere Förderung (PHD-NO-GRANT-20170405) |0 G:(DE-Juel1)PHD-NO-GRANT-20170405 |c PHD-NO-GRANT-20170405 |x 2 |
536 | _ | _ | |a EPI SGA2 (16ME0507K) |0 G:(BMBF)16ME0507K |c 16ME0507K |x 3 |
588 | _ | _ | |a Dataset connected to CrossRef Book Series, Journals: juser.fz-juelich.de |
700 | 1 | _ | |a Pleiter, Dirk |0 P:(DE-Juel1)144441 |b 1 |e Corresponding author |
770 | _ | _ | |z 978-3-031-69765-4=978-3-031-69766-1 |
773 | _ | _ | |a 10.1007/978-3-031-69766-1_4 |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1038510/files/978-3-031-69766-1.pdf |y Restricted |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1038510/files/Preprint.pdf |y Restricted |
909 | C | O | |o oai:juser.fz-juelich.de:1038510 |p VDB |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 0 |6 P:(DE-Juel1)172888 |
913 | 1 | _ | |a DE-HGF |b Key Technologies |l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action |1 G:(DE-HGF)POF4-510 |0 G:(DE-HGF)POF4-511 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Enabling Computational- & Data-Intensive Science and Engineering |9 G:(DE-HGF)POF4-5112 |x 0 |
913 | 1 | _ | |a DE-HGF |b Key Technologies |l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action |1 G:(DE-HGF)POF4-510 |0 G:(DE-HGF)POF4-512 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Supercomputing & Big Data Infrastructures |9 G:(DE-HGF)POF4-5122 |x 1 |
914 | 1 | _ | |y 2024 |
915 | _ | _ | |a Nationallizenz |0 StatID:(DE-HGF)0420 |2 StatID |d 2024-12-28 |w ger |
915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0200 |2 StatID |b SCOPUS |d 2024-12-28 |
920 | _ | _ | |l yes |
920 | 1 | _ | |0 I:(DE-Juel1)JSC-20090406 |k JSC |l Jülich Supercomputing Center |x 0 |
980 | _ | _ | |a contrib |
980 | _ | _ | |a VDB |
980 | _ | _ | |a contb |
980 | _ | _ | |a I:(DE-Juel1)JSC-20090406 |
980 | _ | _ | |a UNRESTRICTED |
Library | Collection | CLSMajor | CLSMinor | Language | Author |
---|