Polynomial Preconditioning for the Conjugate Gradient Method on Massively Parallel Systems

Schelthoff, Christof; Basermann, Achim

Contribution to a conference proceedings

FZJ-2015-03205

Polynomial Preconditioning for the Conjugate Gradient Method on Massively Parallel Systems

Schelthoff, C. ; Basermann, A.

1995
Institut für Informatik Clausthal-Zellerfeld

Proceedings des 13. Workshops über Parallelverarbeitung
13. Workshop über Parallelverarbeitung, Lessach, Austria, 25 Sep 1994 - 1 Oct 1994 Clausthal-Zellerfeld : Institut für Informatik, Informatik-Bericht 95/1, 150-167 (1995)

Please use a persistent id in citations: http://hdl.handle.net/2128/8618

Abstract: A frequently used iterative algorithm for solving large, sparse, symmetric and positiv definite systems of linear equations is the method of conjugate gradients (CG).This method requires one matrix-vector product and some dot products in each iteration. Convergence is dependent on the condition number of the coefficient matrix. So preconditioning techniques are used to reduce the number of iterations.In this context, polynomial preconditioning was developed. This method decreases the total number of dot products by reducing the total number of iterations. Of course, some additional work has to be done for the preconditioning. When a polynomial of degree k is used, k matrix-vector products per iteration have to be calculated rather than one. On scalar machines, this shift between matrix-vector products and dot products influences the performance of the algorithm only slightly. On massively parallel systems, dot products require global synchronization, while the calculation of matrix-vector products merely results in communication with a small number of processors. Hence, polynomial preconditioned CG seems to scale better than CG without preconditioning. Of course, this is not the case in general. The performance of this preconditioner depends on several issues, e.g., the sparsity pattern and the eigenvalue distribution of the matrix, an efficient communication scheme for the matrix-vector products and the time needed for global synchronization of the specific parallel machine. The actual implementation used here is based on Chebyshev polynomials. Performance tests were carried out on the Intel Paragon XP/S 10 with 140 nodes at the Research Centre Jülich (KFA). The CG method with polynomial preconditioning shows better performance and scalability than the basic method on massively parallel machines. Additionally there are some numerical advantages like a higher accuracy and an increased stability.

Classification: