000860345 001__ 860345 000860345 005__ 20200914095008.0 000860345 020__ $$a978-3-540-62898-9 (print) 000860345 020__ $$a978-3-540-69041-2 (electronic) 000860345 0247_ $$2doi$$a10.1007/BFb0031605 000860345 0247_ $$2ISSN$$a0302-9743 000860345 0247_ $$2ISSN$$a1611-3349 000860345 037__ $$aFZJ-2019-01120 000860345 1001_ $$0P:(DE-HGF)0$$aHertzberger, Bob$$b0$$eEditor 000860345 1112_ $$aInternational Conference on High-Performance Computing and Networking$$cVienna$$d1997-04-28 - 1997-04-30$$wAustria 000860345 245__ $$aBLAS-3 for the quadrics parallel computer 000860345 260__ $$aBerlin, Heidelberg$$bSpringer Berlin Heidelberg$$c1997 000860345 29510 $$aHigh-Performance Computing and Networking / Hertzberger, Bob (Editor) ; Berlin, Heidelberg : Springer Berlin Heidelberg, 1997, Chapter 32 ; ISSN: 0302-9743=1611-3349 ; ISBN: 978-3-540-62898-9=978-3-540-69041-2 ; doi:10.1007/BFb0031573 000860345 300__ $$a332 - 341 000860345 3367_ $$2ORCID$$aCONFERENCE_PAPER 000860345 3367_ $$033$$2EndNote$$aConference Paper 000860345 3367_ $$2BibTeX$$aINPROCEEDINGS 000860345 3367_ $$2DRIVER$$aconferenceObject 000860345 3367_ $$2DataCite$$aOutput Types/Conference Paper 000860345 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1600069773_28885 000860345 3367_ $$0PUB:(DE-HGF)7$$2PUB:(DE-HGF)$$aContribution to a book$$mcontb 000860345 4900_ $$aLecture Notes in Computer Science$$v1225 000860345 520__ $$aA scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method enables us to implement an efficient BLAS library on the Italian APE100/Quadrics SISAMD massively parallel computer on which hitherto scalable parallel BLAS-3 were not available. The approach proposed is based on a one-dimensional ring connectivity. The flow of data is hyper-systolic. The communication overhead is competitive with that of established algorithms for SIMD and MIMD machines. Advantages are that (i) the layout of the matrices is preserved during the computation, (ii) BLAS-2 fit well into this layout and (iii) indexed addressing is avoided, which renders the algorithm suitable for SISAMD machines and, in this way, for all other types of parallel computers. On the APE100/Quadrics, a performance of nearly 25 % of the peak performance for multiplications of complex matrices is achieved. 000860345 588__ $$aDataset connected to CrossRef Book Series 000860345 7001_ $$0P:(DE-HGF)0$$aSloot, Peter$$b1$$eEditor 000860345 7001_ $$0P:(DE-Juel1)132179$$aLippert, Th.$$b2$$ufzj 000860345 7001_ $$0P:(DE-HGF)0$$aPetkov, N.$$b3 000860345 7001_ $$0P:(DE-HGF)0$$aSchilling, K.$$b4 000860345 773__ $$a10.1007/BFb0031605 000860345 909CO $$ooai:juser.fz-juelich.de:860345$$pextern4vita 000860345 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132179$$aForschungszentrum Jülich$$b2$$kFZJ 000860345 915__ $$0StatID:(DE-HGF)0420$$2StatID$$aNationallizenz 000860345 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS 000860345 980__ $$acontrib 000860345 980__ $$aEDITORS 000860345 980__ $$acontb 000860345 980__ $$aI:(DE-Juel1)JSC-20090406 000860345 980__ $$aI:(DE-Juel1)NIC-20090406 000860345 9801_ $$aEXTERN4VITA