BLAS-3 for the quadrics parallel computer

Hertzberger, Bob; Lippert, Th.; Petkov, N.; Schilling, K.; Sloot, Peter

doi:10.1007/BFb0031605

Items
Marc 21

001			860345
005			20200914095008.0
020	_	_	\|a 978-3-540-62898-9 (print)
020	_	_	\|a 978-3-540-69041-2 (electronic)
024	7	_	\|a 10.1007/BFb0031605 \|2 doi
024	7	_	\|a 0302-9743 \|2 ISSN
024	7	_	\|a 1611-3349 \|2 ISSN
037	_	_	\|a FZJ-2019-01120
100	1	_	\|a Hertzberger, Bob \|0 P:(DE-HGF)0 \|b 0 \|e Editor
111	2	_	\|a International Conference on High-Performance Computing and Networking \|c Vienna \|d 1997-04-28 - 1997-04-30 \|w Austria
245	_	_	\|a BLAS-3 for the quadrics parallel computer
260	_	_	\|a Berlin, Heidelberg \|c 1997 \|b Springer Berlin Heidelberg
295	1	0	\|a High-Performance Computing and Networking / Hertzberger, Bob (Editor) ; Berlin, Heidelberg : Springer Berlin Heidelberg, 1997, Chapter 32 ; ISSN: 0302-9743=1611-3349 ; ISBN: 978-3-540-62898-9=978-3-540-69041-2 ; doi:10.1007/BFb0031573
300	_	_	\|a 332 - 341
336	7	_	\|a CONFERENCE_PAPER \|2 ORCID
336	7	_	\|a Conference Paper \|0 33 \|2 EndNote
336	7	_	\|a INPROCEEDINGS \|2 BibTeX
336	7	_	\|a conferenceObject \|2 DRIVER
336	7	_	\|a Output Types/Conference Paper \|2 DataCite
336	7	_	\|a Contribution to a conference proceedings \|b contrib \|m contrib \|0 PUB:(DE-HGF)8 \|s 1600069773_28885 \|2 PUB:(DE-HGF)
336	7	_	\|a Contribution to a book \|0 PUB:(DE-HGF)7 \|2 PUB:(DE-HGF) \|m contb
490	0	_	\|a Lecture Notes in Computer Science \|v 1225
520	_	_	\|a A scalable parallel algorithm for matrix multiplication on SISAMD computers is presented. Our method enables us to implement an efficient BLAS library on the Italian APE100/Quadrics SISAMD massively parallel computer on which hitherto scalable parallel BLAS-3 were not available. The approach proposed is based on a one-dimensional ring connectivity. The flow of data is hyper-systolic. The communication overhead is competitive with that of established algorithms for SIMD and MIMD machines. Advantages are that (i) the layout of the matrices is preserved during the computation, (ii) BLAS-2 fit well into this layout and (iii) indexed addressing is avoided, which renders the algorithm suitable for SISAMD machines and, in this way, for all other types of parallel computers. On the APE100/Quadrics, a performance of nearly 25 % of the peak performance for multiplications of complex matrices is achieved.
588	_	_	\|a Dataset connected to CrossRef Book Series
700	1	_	\|a Sloot, Peter \|0 P:(DE-HGF)0 \|b 1 \|e Editor
700	1	_	\|a Lippert, Th. \|0 P:(DE-Juel1)132179 \|b 2 \|u fzj
700	1	_	\|a Petkov, N. \|0 P:(DE-HGF)0 \|b 3
700	1	_	\|a Schilling, K. \|0 P:(DE-HGF)0 \|b 4
773	_	_	\|a 10.1007/BFb0031605
909	C	O	\|p extern4vita \|o oai:juser.fz-juelich.de:860345
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 2 \|6 P:(DE-Juel1)132179
915	_	_	\|a Nationallizenz \|0 StatID:(DE-HGF)0420 \|2 StatID
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0200 \|2 StatID \|b SCOPUS
980	_	_	\|a contrib
980	_	_	\|a EDITORS
980	_	_	\|a contb
980	_	_	\|a I:(DE-Juel1)JSC-20090406
980	_	_	\|a I:(DE-Juel1)NIC-20090406
980	1	_	\|a EXTERN4VITA

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help