Contribution to a conference proceedings FZJ-2024-06434

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Job Scheduling in High Performance Computing Systems with Disaggregated Memory Resources

 ;  ;  ;  ;  ;  ;

2024
IEEE

2024 IEEE International Conference on Cluster Computing, CLUSTER, KobeKobe, Japan, 24 Sep 2024 - 27 Sep 20242024-09-242024-09-27 IEEE 297-309 () [10.1109/CLUSTER59578.2024.00033]

This record in other databases:

Please use a persistent id in citations: doi:  doi:

Abstract: Disaggregated memory promises to meet growing memory requirements of applications while improving system resource utilization in high-performance computing (HPC) systems. Compared to traditional systems—where expensive resources such as CPUs, GPUs, and memory, are assigned to jobs in units of nodes—systems with disaggregated memory introduce memory pools that can be shared among jobs; this introduces new optimization metrics to the job scheduler. In this paper, we propose a data-driven approach to evaluate job scheduling and resource configuration in HPC systems with disaggregated memory. To incorporate the memory requirements of jobs for both local and disaggregated memory resources and improve system efficiency in open-science HPC systems, we introduce a novel job scheduling algorithm called FM (Fair Memory). Our simulation results show that FM outperforms commonly-used job schedulers in terms of jobs’ bounded slowdown when the shared memory pool capacity is limited, and in terms of fairness under all conditions.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5122 - Future Computing & Big Data Systems (POF4-512) (POF4-512)
  2. DEEP-SEA - DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES (955606) (955606)

Appears in the scientific report 2024
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Document types > Events > Contributions to a conference proceedings
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access

 Record created 2024-11-26, last modified 2025-02-04


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)