| Home > Publications database > tfQMRgpu: a GPU-accelerated linear solver with block-sparse complex result matrix > print |
| 001 | 1044949 | ||
| 005 | 20251030202115.0 | ||
| 024 | 7 | _ | |a 10.1007/s11227-025-07145-6 |2 doi |
| 024 | 7 | _ | |a 0920-8542 |2 ISSN |
| 024 | 7 | _ | |a 1573-0484 |2 ISSN |
| 024 | 7 | _ | |a 10.34734/FZJ-2025-03449 |2 datacite_doi |
| 037 | _ | _ | |a FZJ-2025-03449 |
| 082 | _ | _ | |a 620 |
| 100 | 1 | _ | |a Baumeister, Paul F. |0 P:(DE-Juel1)156619 |b 0 |e Corresponding author |
| 245 | _ | _ | |a tfQMRgpu: a GPU-accelerated linear solver with block-sparse complex result matrix |
| 260 | _ | _ | |a Dordrecht [u.a.] |c 2025 |b Springer Science + Business Media B.V |
| 336 | 7 | _ | |a article |2 DRIVER |
| 336 | 7 | _ | |a Output Types/Journal article |2 DataCite |
| 336 | 7 | _ | |a Journal Article |b journal |m journal |0 PUB:(DE-HGF)16 |s 1761835131_12063 |2 PUB:(DE-HGF) |
| 336 | 7 | _ | |a ARTICLE |2 BibTeX |
| 336 | 7 | _ | |a JOURNAL_ARTICLE |2 ORCID |
| 336 | 7 | _ | |a Journal Article |0 0 |2 EndNote |
| 520 | _ | _ | |a We present tfQMRgpu, a GPU-accelerated iterative linear solver based on the transpose-free quasi-minimal residual (tfQMR) method. Designed for large-scale electronic structure calculations, particularly in the context of Korringa–Kohn–Rostoker density functional theory, tfQMRgpu efficiently handles block-sparse complex matrices arising from multiple scattering theory. The solver exploits GPU parallelism to accelerate convergence while leveraging memory-efficient sparse storage formats. By unifying the solution of multiple right-hand side (RHS) block vectors, tfQMRgpu significantly improves throughput, demonstrating up to a speedup on modern GPUs. Additionally, we introduce a flexible implementation framework that supports both explicit matrix-based and matrix-free operator formulations, such as high-order finite-difference stencils for real-space grid-based Green function calculations. Benchmarks on various NVIDIA GPUs demonstrate the solver’s efficiency, in some cases achieving over 56% of peak floating-point performance for block-sparse matrix multiplications. tfQMRgpu is open-source, providing interfaces for C, C++, Fortran, Julia, and Python, making it a versatile tool for high-performance computing applications that can benefit from the unification of RHS problems. |
| 536 | _ | _ | |a 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) |0 G:(DE-HGF)POF4-5111 |c POF4-511 |f POF IV |x 0 |
| 536 | _ | _ | |a 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) |0 G:(DE-HGF)POF4-5112 |c POF4-511 |f POF IV |x 1 |
| 536 | _ | _ | |a 5122 - Future Computing & Big Data Systems (POF4-512) |0 G:(DE-HGF)POF4-5122 |c POF4-512 |f POF IV |x 2 |
| 536 | _ | _ | |a BMBF 01 1H1 6013, NRW 325 – 8.03 – 133340 - SiVeGCS (DB001492) |0 G:(DE-Juel-1)DB001492 |c DB001492 |x 3 |
| 536 | _ | _ | |a ATML-X-DEV - ATML Accelerating Devices (ATML-X-DEV) |0 G:(DE-Juel-1)ATML-X-DEV |c ATML-X-DEV |x 4 |
| 588 | _ | _ | |a Dataset connected to CrossRef, Journals: juser.fz-juelich.de |
| 700 | 1 | _ | |a Nassyr, Stepan |0 P:(DE-Juel1)172888 |b 1 |
| 773 | _ | _ | |a 10.1007/s11227-025-07145-6 |g Vol. 81, no. 5, p. 663 |0 PERI:(DE-600)1479917-0 |n 5 |p 663 |t The journal of supercomputing |v 81 |y 2025 |x 0920-8542 |
| 856 | 4 | _ | |u https://juser.fz-juelich.de/record/1044949/files/s11227-025-07145-6.pdf |y OpenAccess |
| 909 | C | O | |o oai:juser.fz-juelich.de:1044949 |p openaire |p open_access |p OpenAPC_DEAL |p driver |p VDB |p openCost |p dnbdelivery |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 0 |6 P:(DE-Juel1)156619 |
| 910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 1 |6 P:(DE-Juel1)172888 |
| 913 | 1 | _ | |a DE-HGF |b Key Technologies |l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action |1 G:(DE-HGF)POF4-510 |0 G:(DE-HGF)POF4-511 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Enabling Computational- & Data-Intensive Science and Engineering |9 G:(DE-HGF)POF4-5111 |x 0 |
| 913 | 1 | _ | |a DE-HGF |b Key Technologies |l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action |1 G:(DE-HGF)POF4-510 |0 G:(DE-HGF)POF4-511 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Enabling Computational- & Data-Intensive Science and Engineering |9 G:(DE-HGF)POF4-5112 |x 1 |
| 913 | 1 | _ | |a DE-HGF |b Key Technologies |l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action |1 G:(DE-HGF)POF4-510 |0 G:(DE-HGF)POF4-512 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Supercomputing & Big Data Infrastructures |9 G:(DE-HGF)POF4-5122 |x 2 |
| 914 | 1 | _ | |y 2025 |
| 915 | p | c | |a APC keys set |0 PC:(DE-HGF)0000 |2 APC |
| 915 | p | c | |a DEAL: Springer Nature 2020 |0 PC:(DE-HGF)0113 |2 APC |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0200 |2 StatID |b SCOPUS |d 2024-12-18 |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0160 |2 StatID |b Essential Science Indicators |d 2024-12-18 |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)1160 |2 StatID |b Current Contents - Engineering, Computing and Technology |d 2024-12-18 |
| 915 | _ | _ | |a Creative Commons Attribution CC BY 4.0 |0 LIC:(DE-HGF)CCBY4 |2 HGFVOC |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0600 |2 StatID |b Ebsco Academic Search |d 2024-12-18 |
| 915 | _ | _ | |a JCR |0 StatID:(DE-HGF)0100 |2 StatID |b J SUPERCOMPUT : 2022 |d 2024-12-18 |
| 915 | _ | _ | |a WoS |0 StatID:(DE-HGF)0113 |2 StatID |b Science Citation Index Expanded |d 2024-12-18 |
| 915 | _ | _ | |a DEAL Springer |0 StatID:(DE-HGF)3002 |2 StatID |d 2024-12-18 |w ger |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0150 |2 StatID |b Web of Science Core Collection |d 2024-12-18 |
| 915 | _ | _ | |a IF < 5 |0 StatID:(DE-HGF)9900 |2 StatID |d 2024-12-18 |
| 915 | _ | _ | |a OpenAccess |0 StatID:(DE-HGF)0510 |2 StatID |
| 915 | _ | _ | |a Peer Review |0 StatID:(DE-HGF)0030 |2 StatID |b ASC |d 2024-12-18 |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0300 |2 StatID |b Medline |d 2024-12-18 |
| 915 | _ | _ | |a Nationallizenz |0 StatID:(DE-HGF)0420 |2 StatID |d 2024-12-18 |w ger |
| 915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0199 |2 StatID |b Clarivate Analytics Master Journal List |d 2024-12-18 |
| 920 | 1 | _ | |0 I:(DE-Juel1)JSC-20090406 |k JSC |l Jülich Supercomputing Center |x 0 |
| 980 | _ | _ | |a journal |
| 980 | _ | _ | |a VDB |
| 980 | _ | _ | |a UNRESTRICTED |
| 980 | _ | _ | |a I:(DE-Juel1)JSC-20090406 |
| 980 | _ | _ | |a APC |
| 980 | 1 | _ | |a APC |
| 980 | 1 | _ | |a FullTexts |
| Library | Collection | CLSMajor | CLSMinor | Language | Author |
|---|