001038510 001__ 1038510
001038510 005__ 20250203124521.0
001038510 020__ $$a978-3-031-69765-4 (print)
001038510 020__ $$a978-3-031-69766-1 (electronic)
001038510 0247_ $$2doi$$a10.1007/978-3-031-69766-1_4
001038510 0247_ $$2ISSN$$a0302-9743
001038510 0247_ $$2ISSN$$a1611-3349
001038510 0247_ $$2WOS$$aWOS:001308370400004
001038510 037__ $$aFZJ-2025-01495
001038510 1001_ $$0P:(DE-Juel1)172888$$aNassyr, Stepan$$b0$$ufzj
001038510 1112_ $$a30th European Conference on Parallel and Distributed Processing$$cMadrid$$d2024-08-26 - 2024-08-30$$gEuro-Par 2024$$wSpain
001038510 245__ $$aExploring Processor Micro-architectures Optimised for BLAS3 Micro-kernels
001038510 260__ $$aCham$$bSpringer Nature Switzerland$$c2024
001038510 29510 $$aEuro-Par 2024: Parallel Processing
001038510 300__ $$a47 - 61
001038510 3367_ $$2ORCID$$aCONFERENCE_PAPER
001038510 3367_ $$033$$2EndNote$$aConference Paper
001038510 3367_ $$2BibTeX$$aINPROCEEDINGS
001038510 3367_ $$2DRIVER$$aconferenceObject
001038510 3367_ $$2DataCite$$aOutput Types/Conference Paper
001038510 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1738307726_672
001038510 3367_ $$0PUB:(DE-HGF)7$$2PUB:(DE-HGF)$$aContribution to a book$$mcontb
001038510 4900_ $$aLecture Notes in Computer Science$$v14802
001038510 520__ $$aDense matrix-matrix operations are relevant for a broad range of numerical applications, e.g. for implementing deep neural networks. Past research has led to a good understanding of how these operations can be mapped in a generic manner on typical processor architectures with multiple cache levels such that near-optimal performance can be reached. However, while commonly used micro-architectures are typically suitable for such operations, their architectural parameters need to be suitably tuned. The performance of highly optimised implementations of these operations relies on micro-kernels that are often handwritten. Given the increased variety of instruction set architectures and SIMD instruction extensions, this becomes challenging. In this paper, wepresent and implement a methodology for an exhaustive exploration of a processor core micro-architecture design space based on gem5 simulations. Furthermore, we present a tool for generating efficiently vectorised code leveraging Arm’s SVE and RISC-V’s RVV instructions. It enables automatisation of the generation of micro-kernels and, therefore, the generation of a large range of such kernels. The results provide insights both, to micro-architecture architects as well as micro-kernel developers. The assembler generator is open-sourced and the simulation data is availableas supplementary material.
001038510 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001038510 536__ $$0G:(DE-HGF)POF4-5122$$a5122 - Future Computing & Big Data Systems (POF4-512)$$cPOF4-512$$fPOF IV$$x1
001038510 536__ $$0G:(DE-Juel1)PHD-NO-GRANT-20170405$$aPhD no Grant - Doktorand ohne besondere Förderung (PHD-NO-GRANT-20170405)$$cPHD-NO-GRANT-20170405$$x2
001038510 536__ $$0G:(BMBF)16ME0507K$$aEPI SGA2 (16ME0507K)$$c16ME0507K$$x3
001038510 588__ $$aDataset connected to CrossRef Book Series, Journals: juser.fz-juelich.de
001038510 7001_ $$0P:(DE-Juel1)144441$$aPleiter, Dirk$$b1$$eCorresponding author
001038510 770__ $$z978-3-031-69765-4=978-3-031-69766-1
001038510 773__ $$a10.1007/978-3-031-69766-1_4
001038510 8564_ $$uhttps://juser.fz-juelich.de/record/1038510/files/978-3-031-69766-1.pdf$$yRestricted
001038510 8564_ $$uhttps://juser.fz-juelich.de/record/1038510/files/Preprint.pdf$$yRestricted
001038510 909CO $$ooai:juser.fz-juelich.de:1038510$$pVDB
001038510 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)172888$$aForschungszentrum Jülich$$b0$$kFZJ
001038510 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001038510 9131_ $$0G:(DE-HGF)POF4-512$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5122$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vSupercomputing & Big Data Infrastructures$$x1
001038510 9141_ $$y2024
001038510 915__ $$0StatID:(DE-HGF)0420$$2StatID$$aNationallizenz$$d2024-12-28$$wger
001038510 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2024-12-28
001038510 920__ $$lyes
001038510 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001038510 980__ $$acontrib
001038510 980__ $$aVDB
001038510 980__ $$acontb
001038510 980__ $$aI:(DE-Juel1)JSC-20090406
001038510 980__ $$aUNRESTRICTED