001031786 001__ 1031786
001031786 005__ 20250317091735.0
001031786 0247_ $$2doi$$a10.1109/SBAC-PAD63648.2024.00023
001031786 0247_ $$2datacite_doi$$a10.34734/FZJ-2024-05813
001031786 037__ $$aFZJ-2024-05813
001031786 041__ $$aEnglish
001031786 1001_ $$0P:(DE-Juel1)200390$$aMaloney, Samuel$$b0$$eCorresponding author$$ufzj
001031786 1112_ $$a2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing$$cHilo, HI$$d2024-11-13 - 2024-11-15$$gSBAC-PAD$$wUSA
001031786 245__ $$aAnalyzing HPC Monitoring Data With a View Towards Efficient Resource Utilization
001031786 260__ $$bIEEE$$c2024
001031786 29510 $$a2024 IEEE 36th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)
001031786 300__ $$a170-181
001031786 3367_ $$2ORCID$$aCONFERENCE_PAPER
001031786 3367_ $$033$$2EndNote$$aConference Paper
001031786 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$mjournal
001031786 3367_ $$2BibTeX$$aINPROCEEDINGS
001031786 3367_ $$2DRIVER$$aconferenceObject
001031786 3367_ $$2DataCite$$aOutput Types/Conference Paper
001031786 3367_ $$0PUB:(DE-HGF)8$$2PUB:(DE-HGF)$$aContribution to a conference proceedings$$bcontrib$$mcontrib$$s1736144602_25368
001031786 3367_ $$0PUB:(DE-HGF)7$$2PUB:(DE-HGF)$$aContribution to a book$$mcontb
001031786 500__ $$aThe data used for this study are available at: https://doi.org/10.26165/JUELICH-DATA/BDFBPQ 979-8-3503-5616-8/24/$31.00 © 2024 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
001031786 520__ $$aCompute nodes in modern HPC systems are growing in size and their hardware has become ever more diverse. Still, many HPC centers allocate the resources of full nodes exclusively to avoid contention, despite the associated risk of underutilization. This paper describes a thorough resource utilization study of CPU and GPU compute and memory capacity, and interconnect bandwidth on JUWELS, a mature leadership-class modular supercomputer, with the aim of identifying opportunities for improving utilization through advanced scheduling and node sharing. Separate analysis of CPU-only and GPU-accelerated nodes finds that CPU compute usage is already close to optimal for the CPU-only nodes, whereas there is plenty of scope for co-scheduling CPU-based jobs on GPU-accelerated nodes. Memory capacity and node-level interconnect bandwidth are sufficient to provision co-scheduled jobs. We analyze multiple one-month datasets to validate robustness of conclusions over time and compare with previous studies on other systems to establish generalizability of results.
001031786 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001031786 536__ $$0G:(DE-HGF)POF4-5122$$a5122 - Future Computing & Big Data Systems (POF4-512)$$cPOF4-512$$fPOF IV$$x1
001031786 536__ $$0G:(EU-Grant)955606$$aDEEP-SEA - DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES (955606)$$c955606$$fH2020-JTI-EuroHPC-2019-1$$x2
001031786 536__ $$0G:(DE-Juel-1)ATMLAO$$aATMLAO - ATML Application Optimization and User Service Tools (ATMLAO)$$cATMLAO$$x3
001031786 588__ $$aDataset connected to CrossRef Conference
001031786 7001_ $$0P:(DE-Juel1)142361$$aSuarez, Estela$$b1$$ufzj
001031786 7001_ $$0P:(DE-Juel1)132090$$aEicker, Norbert$$b2$$ufzj
001031786 7001_ $$0P:(DE-Juel1)162225$$aGuimaraes, Filipe$$b3$$ufzj
001031786 7001_ $$0P:(DE-Juel1)132108$$aFrings, Wolfgang$$b4$$ufzj
001031786 773__ $$a10.1109/SBAC-PAD63648.2024.00023$$p170-181$$t2643-3001$$y2024
001031786 8564_ $$uhttps://juser.fz-juelich.de/record/1031786/files/Maloney2024-postprint.pdf$$yOpenAccess
001031786 8564_ $$uhttps://juser.fz-juelich.de/record/1031786/files/SBAC-PAD-24-presentation.pdf$$yRestricted
001031786 909CO $$ooai:juser.fz-juelich.de:1031786$$pdnbdelivery$$pec_fundedresources$$pVDB$$pdriver$$popen_access$$popenaire
001031786 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)200390$$aForschungszentrum Jülich$$b0$$kFZJ
001031786 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)142361$$aForschungszentrum Jülich$$b1$$kFZJ
001031786 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132090$$aForschungszentrum Jülich$$b2$$kFZJ
001031786 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)162225$$aForschungszentrum Jülich$$b3$$kFZJ
001031786 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)132108$$aForschungszentrum Jülich$$b4$$kFZJ
001031786 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001031786 9131_ $$0G:(DE-HGF)POF4-512$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5122$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vSupercomputing & Big Data Infrastructures$$x1
001031786 9141_ $$y2024
001031786 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001031786 920__ $$lyes
001031786 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001031786 980__ $$acontrib
001031786 980__ $$aVDB
001031786 980__ $$aUNRESTRICTED
001031786 980__ $$ajournal
001031786 980__ $$acontb
001031786 980__ $$aI:(DE-Juel1)JSC-20090406
001031786 9801_ $$aFullTexts