%0 Journal Article
%A Baumeister, Paul F.
%A Nassyr, Stepan
%T tfQMRgpu: a GPU-accelerated linear solver with block-sparse complex result matrix
%J The journal of supercomputing
%V 81
%N 5
%@ 0920-8542
%C Dordrecht [u.a.]
%I Springer Science + Business Media B.V
%M FZJ-2025-03449
%P 663
%D 2025
%X We present tfQMRgpu, a GPU-accelerated iterative linear solver based on the transpose-free quasi-minimal residual (tfQMR) method. Designed for large-scale electronic structure calculations, particularly in the context of Korringa–Kohn–Rostoker density functional theory, tfQMRgpu efficiently handles block-sparse complex matrices arising from multiple scattering theory. The solver exploits GPU parallelism to accelerate convergence while leveraging memory-efficient sparse storage formats. By unifying the solution of multiple right-hand side (RHS) block vectors, tfQMRgpu significantly improves throughput, demonstrating up to a speedup on modern GPUs. Additionally, we introduce a flexible implementation framework that supports both explicit matrix-based and matrix-free operator formulations, such as high-order finite-difference stencils for real-space grid-based Green function calculations. Benchmarks on various NVIDIA GPUs demonstrate the solver’s efficiency, in some cases achieving over 56% of peak floating-point performance for block-sparse matrix multiplications. tfQMRgpu is open-source, providing interfaces for C, C++, Fortran, Julia, and Python, making it a versatile tool for high-performance computing applications that can benefit from the unification of RHS problems.
%F PUB:(DE-HGF)16
%9 Journal Article
%R 10.1007/s11227-025-07145-6
%U https://juser.fz-juelich.de/record/1044949