TY  - JOUR
AU  - Tarraf, Ahmad
AU  - Schreiber, Martin
AU  - Cascajo, Alberto
AU  - Besnard, Jean-Baptiste
AU  - Vef, Marc-André
AU  - Huber, Dominik
AU  - Happ, Sonja
AU  - Brinkmann, André
AU  - Singh, David E.
AU  - Hoppe, Hans-Christian
AU  - Miranda, Alberto
AU  - Peña, Antonio J.
AU  - Machado, Rui
AU  - Garcia-Gasulla, Marta
AU  - Schulz, Martin
AU  - Carpenter, Paul
AU  - Pickartz, Simon
AU  - Rotaru, Tiberiu
AU  - Iserte, Sergio
AU  - Lopez, Victor
AU  - Ejarque, Jorge
AU  - Sirwani, Heena
AU  - Carretero, Jesus
AU  - Wolf, Felix
TI  - Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities
JO  - IEEE transactions on parallel and distributed systems
VL  - 35
IS  - 9
SN  - 2161-9883
CY  - New York, NY
PB  - IEEE
M1  - FZJ-2025-00107
SP  - 1551 - 1564
PY  - 2024
AB  - With the increase of complex scientific simulations driven by workflows and heterogeneous workload profiles, managing system resources effectively is essential for improving performance and system throughput, especially due to trends like heterogeneous HPC and deeply integrated systems with on-chip accelerators. For optimal resource utilization, dynamic resource allocation can improve productivity across all system and application levels, by adapting the applications’ configurations to the system's resources. In this context, malleable jobs, which can change resources at runtime, can increase the system throughput and resource utilization while bringing various advantages for HPC users (e.g., shorter waiting time). Malleability has received much attention recently, even though it has been an active research area for more than two decades. This article presents the state-of-the-art of malleable implementations in HPC systems, targeting mainly malleability in compute and I/O resources. Based on our experiences, we state our current concerns and list future opportunities for research.
LB  - PUB:(DE-HGF)16
UR  - <Go to ISI:>//WOS:001272190100002
DO  - DOI:10.1109/TPDS.2024.3406764
UR  - https://juser.fz-juelich.de/record/1035001
ER  -