%0 Conference Paper
%A Li, Jie
%A Michelogiannakis, George
%A Maloney, Samuel
%A Cook, Brandon
%A Suarez, Estela
%A Shalf, John
%A Chen, Yong
%T Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources
%I IEEE
%M FZJ-2024-06434
%P 297-309
%D 2024
%X Disaggregated memory promises to meet growing memory requirements of applications while improving system resource utilization in high-performance computing (HPC) systems. Compared to traditional systems—where expensive resources such as CPUs, GPUs, and memory, are assigned to jobs in units of nodes—systems with disaggregated memory introduce memory pools that can be shared among jobs; this introduces new optimization metrics to the job scheduler. In this paper, we propose a data-driven approach to evaluate job scheduling and resource configuration in HPC systems with disaggregated memory. To incorporate the memory requirements of jobs for both local and disaggregated memory resources and improve system efficiency in open-science HPC systems, we introduce a novel job scheduling algorithm called FM (Fair Memory). Our simulation results show that FM outperforms commonly-used job schedulers in terms of jobs’ bounded slowdown when the shared memory pool capacity is limited, and in terms of fairness under all conditions.
%B 2024 IEEE International Conference on Cluster Computing
%C 24 Sep 2024 - 27 Sep 2024, Kobe (Japan)
Y2 24 Sep 2024 - 27 Sep 2024
M2 Kobe, Japan
%F PUB:(DE-HGF)8
%9 Contribution to a conference proceedings
%R 10.1109/CLUSTER59578.2024.00033
%U https://juser.fz-juelich.de/record/1033553