Journal Article FZJ-2025-00107

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities

 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;

2024
IEEE New York, NY

IEEE transactions on parallel and distributed systems 35(9), 1551 - 1564 () [10.1109/TPDS.2024.3406764]

This record in other databases:  

Please use a persistent id in citations: doi:  doi:

Abstract: With the increase of complex scientific simulations driven by workflows and heterogeneous workload profiles, managing system resources effectively is essential for improving performance and system throughput, especially due to trends like heterogeneous HPC and deeply integrated systems with on-chip accelerators. For optimal resource utilization, dynamic resource allocation can improve productivity across all system and application levels, by adapting the applications’ configurations to the system's resources. In this context, malleable jobs, which can change resources at runtime, can increase the system throughput and resource utilization while bringing various advantages for HPC users (e.g., shorter waiting time). Malleability has received much attention recently, even though it has been an active research area for more than two decades. This article presents the state-of-the-art of malleable implementations in HPC systems, targeting mainly malleability in compute and I/O resources. Based on our experiences, we state our current concerns and list future opportunities for research.

Classification:

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5122 - Future Computing & Big Data Systems (POF4-512) (POF4-512)
  2. DEEP-SEA - DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES (955606) (955606)
  3. ADMIRE - Adaptive multi-tier intelligent data manager for Exascale (956748) (956748)
  4. TIME-X - TIME parallelisation: for eXascale computing and beyond (955701) (955701)
  5. Verbundprojekt: TIME-X - Parallelisierung zeitabhängiger Simulationen für das zukünftige Supercomputing (16HPC047) (16HPC047)
  6. REGALE - An open architecture to equip next generation HPC applications with exascale capabilities (956560) (956560)

Appears in the scientific report 2024
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; OpenAccess ; Clarivate Analytics Master Journal List ; Current Contents - Engineering, Computing and Technology ; Ebsco Academic Search ; Essential Science Indicators ; IF >= 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Dokumenttypen > Aufsätze > Zeitschriftenaufsätze
Workflowsammlungen > Öffentliche Einträge
Institutssammlungen > JSC
Publikationsdatenbank
Open Access

 Datensatz erzeugt am 2025-01-06, letzte Änderung am 2025-02-03


OpenAccess:
Volltext herunterladen PDF
Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)