001033553 001__ 1033553
001033553 005__ 20250204215538.0
001033553 0247_ $$2doi$$a10.1109/CLUSTER59578.2024.00033
001033553 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-06434
001033553 037__ $$aFZJ-2024-06434
001033553 041__ $$aEnglish
001033553 1001_ $$0P:(DE-HGF)0$$aLi, Jie$$b0$$eCorresponding author
001033553 1112_ $$a2024 IEEE International Conference on Cluster Computing$$cKobe$$d2024-09-24 - 2024-09-27$$gCLUSTER$$wJapan
001033553 245__ $$aJob Scheduling in High Performance Computing Systems with Disaggregated Memory Resources
001033553 260__ $$bIEEE$$c2024
001033553 300__ $$a297-309
001033553 3367_ $$2ORCID$$aCONFERENCE_PAPER
001033553 3367_ $$033$$2EndNote$$aConference Paper
001033553 3367_ $$2BibTeX$$aINPROCEEDINGS
001033553 3367_ $$2DRIVER$$aconferenceObject
001033553 3367_ $$2DataCite$$aOutput Types/Conference Paper
001033553 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1736147305_25368
001033553 520__ $$aDisaggregated memory promises to meet growing memory requirements of applications while improving system resource utilization in high-performance computing (HPC) systems. Compared to traditional systems—where expensive resources such as CPUs, GPUs, and memory, are assigned to jobs in units of nodes—systems with disaggregated memory introduce memory pools that can be shared among jobs; this introduces new optimization metrics to the job scheduler. In this paper, we propose a data-driven approach to evaluate job scheduling and resource configuration in HPC systems with disaggregated memory. To incorporate the memory requirements of jobs for both local and disaggregated memory resources and improve system efficiency in open-science HPC systems, we introduce a novel job scheduling algorithm called FM (Fair Memory). Our simulation results show that FM outperforms commonly-used job schedulers in terms of jobs’ bounded slowdown when the shared memory pool capacity is limited, and in terms of fairness under all conditions.
001033553 536__ $$0G:(DE-HGF)POF4-5122$$a5122 - Future Computing & Big Data Systems (POF4-512)$$cPOF4-512$$fPOF IV$$x0
001033553 536__ $$0G:(EU-Grant)955606$$aDEEP-SEA - DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES (955606)$$c955606$$fH2020-JTI-EuroHPC-2019-1$$x1
001033553 588__ $$aDataset connected to CrossRef Conference
001033553 7001_ $$0P:(DE-HGF)0$$aMichelogiannakis, George$$b1$$eCorresponding author
001033553 7001_ $$0P:(DE-Juel1)200390$$aMaloney, Samuel$$b2$$ufzj
001033553 7001_ $$0P:(DE-HGF)0$$aCook, Brandon$$b3
001033553 7001_ $$0P:(DE-Juel1)142361$$aSuarez, Estela$$b4$$ufzj
001033553 7001_ $$0P:(DE-HGF)0$$aShalf, John$$b5
001033553 7001_ $$0P:(DE-HGF)0$$aChen, Yong$$b6
001033553 773__ $$a10.1109/CLUSTER59578.2024.00033$$p297-309$$y2024
001033553 8564_ $$uhttps://juser.fz-juelich.de/record/1033553/files/li2024_accepted_article.pdf$$yOpenAccess
001033553 909CO $$ooai:juser.fz-juelich.de:1033553$$pVDB$$pdriver$$popen_access$$popenaire$$pec_fundedresources$$pdnbdelivery
001033553 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)200390$$aForschungszentrum Jülich$$b2$$kFZJ
001033553 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)142361$$aForschungszentrum Jülich$$b4$$kFZJ
001033553 9131_ $$0G:(DE-HGF)POF4-512$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5122$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vSupercomputing & Big Data Infrastructures$$x0
001033553 9141_ $$y2024
001033553 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001033553 920__ $$lyes
001033553 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001033553 980__ $$acontrib
001033553 980__ $$aVDB
001033553 980__ $$aUNRESTRICTED
001033553 980__ $$aI:(DE-Juel1)JSC-20090406
001033553 9801_ $$aFullTexts