Book/Dissertation / PhD Thesis FZJ-2024-05160

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Eventify Meets Heterogeneity: Enabling Fine-Grained Task-Parallelism on GPUs



2024
Forschungszentrum Jülich GmbH, Zentralbibliothek, Verlag Jülich
ISBN: 978-3-95806-765-3

Jülich : Forschungszentrum Jülich GmbH, Zentralbibliothek, Verlag, Schriften des Forschungszentrums Jülich IAS Series 63, xv, 110 Seiten : Illustrationen, Diagramme () [10.34734/FZJ-2024-05160] = Dissertation, Techn. Univ. Chemnitz, 2023

This record in other databases:

Please use a persistent id in citations:   doi:

Abstract: Many scientific computing algorithms barely provide sufficient data-parallelism to exploit the ever-increasing hardware parallelism of today’s heterogeneous computing environments. The challenge is to fully exploit the parallelization potential of such algorithms. To tackle this challenge, diverse task-parallel programming technologies have been introduced that allow for the flexible description of algorithms along task graphs. For algorithms with dense task graphs, however, taskparallelism is still hard to exploit efficiently since it is programmatically complex to describe and imposes high dependency resolution overheads on the execution model. This becomes especially challenging on GPUs which are not designed for synchronization-heavy applications. The research objective of this thesis is an execution model that enables fine-grained task parallelism on GPUs. To reach this objective, the contributions of the thesis are five fold. Firstly, it refines the stream interaction model behind Flynn’s Taxonomy as uniform foundation forconcurrency in architectures and programming models. Secondly, it analyzes the quantitative trends in CPU and GPU architectures and examines their influence on programming models. Thirdly, it introduces an execution model that enables threading, efficient blocking synchronization and queue-based task scheduling on GPUs. Fourthly, it ports the task-parallel programming library Eventify to GPUs. And fifthly, it examines the performance and sustainability of this approach with the task graph of a fast multipole method as use case. The results show that fine-grained task parallelism improves execution time by an order of magnitude in comparison to classical loop-based data parallelism.


Note: Dissertation, Techn. Univ. Chemnitz, 2023

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)

Appears in the scientific report 2024
Database coverage:
Creative Commons Attribution CC BY 4.0 ; OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Dokumenttypen > Hochschulschriften > Doktorarbeiten
Dokumenttypen > Bücher > Bücher
Workflowsammlungen > Öffentliche Einträge
Institutssammlungen > JSC
Publikationsdatenbank
Open Access

 Datensatz erzeugt am 2024-08-05, letzte Änderung am 2025-01-06


OpenAccess:
Volltext herunterladen PDF
Externer link:
Volltext herunterladenFulltext by OpenAccess repository
Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)