Journal Article FZJ-2025-03449

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
tfQMRgpu: a GPU-accelerated linear solver with block-sparse complex result matrix

 ;

2025
Springer Science + Business Media B.V Dordrecht [u.a.]

The journal of supercomputing 81(5), 663 () [10.1007/s11227-025-07145-6]

This record in other databases:  

Please use a persistent id in citations: doi:  doi:

Abstract: We present tfQMRgpu, a GPU-accelerated iterative linear solver based on the transpose-free quasi-minimal residual (tfQMR) method. Designed for large-scale electronic structure calculations, particularly in the context of Korringa–Kohn–Rostoker density functional theory, tfQMRgpu efficiently handles block-sparse complex matrices arising from multiple scattering theory. The solver exploits GPU parallelism to accelerate convergence while leveraging memory-efficient sparse storage formats. By unifying the solution of multiple right-hand side (RHS) block vectors, tfQMRgpu significantly improves throughput, demonstrating up to a speedup on modern GPUs. Additionally, we introduce a flexible implementation framework that supports both explicit matrix-based and matrix-free operator formulations, such as high-order finite-difference stencils for real-space grid-based Green function calculations. Benchmarks on various NVIDIA GPUs demonstrate the solver’s efficiency, in some cases achieving over 56% of peak floating-point performance for block-sparse matrix multiplications. tfQMRgpu is open-source, providing interfaces for C, C++, Fortran, Julia, and Python, making it a versatile tool for high-performance computing applications that can benefit from the unification of RHS problems.

Classification:

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) (POF4-511)
  2. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)
  3. 5122 - Future Computing & Big Data Systems (POF4-512) (POF4-512)
  4. BMBF 01 1H1 6013, NRW 325 – 8.03 – 133340 - SiVeGCS (DB001492) (DB001492)
  5. ATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV) (ATML-X-DEV)

Appears in the scientific report 2025
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; OpenAccess ; Clarivate Analytics Master Journal List ; Current Contents - Engineering, Computing and Technology ; DEAL Springer ; Ebsco Academic Search ; Essential Science Indicators ; IF < 5 ; JCR ; NationallizenzNationallizenz ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Workflow collections > Public records
Workflow collections > Publication Charges
Institute Collections > JSC
Publications database
Open Access

 Record created 2025-08-11, last modified 2025-10-30


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)