A Parallel-in-Space Simulator for Accelerating Power System Simulation on Graphics Processing Units

Zhang, Junjie

Book/Dissertation / PhD Thesis

FZJ-2026-01635

A Parallel-in-Space Simulator for Accelerating Power System Simulation on Graphics Processing Units

Zhang, J. (Corresponding author)FZJ*

2026
Forschungszentrum Jülich GmbH Zentralbibliothek, Verlag Jülich
ISBN: 978-3-95806-882-7

Jülich : Forschungszentrum Jülich GmbH Zentralbibliothek, Verlag, Schriften des Forschungszentrums Jülich Reihe Energie & Umwelt / Energy & Environment 691, 112 pp. (2026) [10.34734/FZJ-2026-01635] = Dissertation, RWTH Aachen University, 2025

This record in other databases:

Please use a persistent id in citations: urn:nbn:de:0001-2603311227325.651063592558 doi:10.34734/FZJ-2026-01635

Abstract: The requirements on power system simulation techniques have been challenged due to growing size and complexity in the modern power systems. In order to address these challenges, we apply parallel-in-space algorithms and utilize GPU for acceleration, and the results show that our prototype achieves orders of magnitude speedups over execution on conventional multi-core processors. We observed speedups up to more than 80 × over an optimized sequential simulation and faster than real time execution capabilities. On the algorithm level, the parallel-in-space approach proposed in this dissertation extracts computations into single-data-multiple-threads form so that it can be mapped and executed on GPUs efficiently. Furthermore, the dependencies among computing tasks are also exploited so that task-level parallelism can be achieved on top of data-parallel. The modeling of power system also took advantage of the shifted frequency analysis to balance between accuracy and computational burden. Moreover, the parallel-in-space algorithm is also combined with the parallel-in-time algorithm to exploit parallelism on temporal level to achieve higher speedup. We analyzed the algorithms for spatial and temporal algorithms with respect to computing efficiency, convergence to achieve better throughput. On the implementation level, the prototype simulator was developed based on dataoriented design instead of traditional object-oriented design, so that the hardware accelerator as well as the space-time parallel algorithm can be exploited more efficiently. Moreover, we utilized the similarity of heterogeneous computing frameworks and implemented a flexibility layer which abstracts different frameworks based on the host-device model, such that the simulator can utilize GPUs under different computing frameworks as well as other hardware accelerators like FPGA. The computing performance on GPU for component computation is tuned automatically using our approach based on automatic optimization empirical optimization of software. The approach can generate high performance kernels for numerical integration routines on GPU. Benchmark shows that these optimized integration routines outperform routines implemented using standard GPU-based linear algebra libraries with speedup between 1.3 × to 6.7 ×.

Note: Dissertation, RWTH Aachen University, 2025

Contributing Institute(s):