000860345 001__ 860345
000860345 005__ 20200914095008.0
000860345 020__ $$a978-3-540-62898-9 (print)
000860345 020__ $$a978-3-540-69041-2 (electronic)
000860345 0247_ $$2doi$$a10.1007/BFb0031605
000860345 0247_ $$2ISSN$$a0302-9743
000860345 0247_ $$2ISSN$$a1611-3349
000860345 037__ $$aFZJ-2019-01120
000860345 1001_ $$0P:(DE-HGF)0$$aHertzberger, Bob$$b0$$eEditor
000860345 1112_ $$aInternational Conference on High-Performance Computing and Networking$$cVienna$$d1997-04-28 - 1997-04-30$$wAustria
000860345 245__ $$aBLAS-3 for the quadrics parallel computer
000860345 260__ $$aBerlin, Heidelberg$$bSpringer Berlin Heidelberg$$c1997
000860345 29510 $$aHigh-Performance Computing and Networking / Hertzberger, Bob (Editor)   ; Berlin, Heidelberg : Springer Berlin Heidelberg, 1997, Chapter 32 ; ISSN: 0302-9743=1611-3349 ; ISBN: 978-3-540-62898-9=978-3-540-69041-2 ; doi:10.1007/BFb0031573
000860345 300__ $$a332 - 341
000860345 3367_ $$2ORCID$$aCONFERENCE_PAPER
000860345 3367_ $$033$$2EndNote$$aConference Paper
000860345 3367_ $$2BibTeX$$aINPROCEEDINGS
000860345 3367_ $$2DRIVER$$aconferenceObject
000860345 3367_ $$2DataCite$$aOutput Types/Conference Paper
000860345 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1600069773_28885
000860345 3367_ $$0PUB:(DE-HGF)7$$2PUB:(DE-HGF)$$aContribution to a book$$mcontb
000860345 4900_ $$aLecture Notes in Computer Science$$v1225
000860345 520__ $$aA scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method enables us to implement an efficient BLAS library on the Italian APE100/Quadrics SISAMD massively parallel computer on which hitherto scalable parallel BLAS-3 were not available. The approach proposed is based on a one-dimensional ring connectivity. The flow of data is hyper-systolic. The communication overhead is competitive with that of established algorithms for SIMD and MIMD machines. Advantages are that (i) the layout of the matrices is preserved during the computation, (ii) BLAS-2 fit well into this layout and (iii) indexed addressing is avoided, which renders the algorithm suitable for SISAMD machines and, in this way, for all other types of parallel computers. On the APE100/Quadrics, a performance of nearly 25 % of the peak performance for multiplications of complex matrices is achieved.
000860345 588__ $$aDataset connected to CrossRef Book Series
000860345 7001_ $$0P:(DE-HGF)0$$aSloot, Peter$$b1$$eEditor
000860345 7001_ $$0P:(DE-Juel1)132179$$aLippert, Th.$$b2$$ufzj
000860345 7001_ $$0P:(DE-HGF)0$$aPetkov, N.$$b3
000860345 7001_ $$0P:(DE-HGF)0$$aSchilling, K.$$b4
000860345 773__ $$a10.1007/BFb0031605
000860345 909CO $$ooai:juser.fz-juelich.de:860345$$pextern4vita
000860345 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132179$$aForschungszentrum Jülich$$b2$$kFZJ
000860345 915__ $$0StatID:(DE-HGF)0420$$2StatID$$aNationallizenz
000860345 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS
000860345 980__ $$acontrib
000860345 980__ $$aEDITORS
000860345 980__ $$acontb
000860345 980__ $$aI:(DE-Juel1)JSC-20090406
000860345 980__ $$aI:(DE-Juel1)NIC-20090406
000860345 9801_ $$aEXTERN4VITA