TY - CONF
AU - Li, Jie
AU - Michelogiannakis, George
AU - Maloney, Samuel
AU - Cook, Brandon
AU - Suarez, Estela
AU - Shalf, John
AU - Chen, Yong
TI - Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources
PB - IEEE
M1 - FZJ-2024-06434
SP - 297-309
PY - 2024
AB - Disaggregated memory promises to meet growing memory requirements of applications while improving system resource utilization in high-performance computing (HPC) systems. Compared to traditional systems—where expensive resources such as CPUs, GPUs, and memory, are assigned to jobs in units of nodes—systems with disaggregated memory introduce memory pools that can be shared among jobs; this introduces new optimization metrics to the job scheduler. In this paper, we propose a data-driven approach to evaluate job scheduling and resource configuration in HPC systems with disaggregated memory. To incorporate the memory requirements of jobs for both local and disaggregated memory resources and improve system efficiency in open-science HPC systems, we introduce a novel job scheduling algorithm called FM (Fair Memory). Our simulation results show that FM outperforms commonly-used job schedulers in terms of jobs’ bounded slowdown when the shared memory pool capacity is limited, and in terms of fairness under all conditions.
T2 - 2024 IEEE International Conference on Cluster Computing
CY - 24 Sep 2024 - 27 Sep 2024, Kobe (Japan)
Y2 - 24 Sep 2024 - 27 Sep 2024
M2 - Kobe, Japan
LB - PUB:(DE-HGF)8
DO - DOI:10.1109/CLUSTER59578.2024.00033
UR - https://juser.fz-juelich.de/record/1033553
ER -