Home > Publications database > Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities > print |
001 | 1035001 | ||
005 | 20250203133237.0 | ||
024 | 7 | _ | |a 10.1109/TPDS.2024.3406764 |2 doi |
024 | 7 | _ | |a 10.34734/FZJ-2025-00107 |2 datacite_doi |
024 | 7 | _ | |a WOS:001272190100002 |2 WOS |
037 | _ | _ | |a FZJ-2025-00107 |
082 | _ | _ | |a 004 |
100 | 1 | _ | |a Tarraf, Ahmad |0 0000-0002-9174-5598 |b 0 |e Corresponding author |
245 | _ | _ | |a Malleability in Modern HPC Systems: Current Experiences, Challenges, and Future Opportunities |
260 | _ | _ | |a New York, NY |c 2024 |b IEEE |
336 | 7 | _ | |a article |2 DRIVER |
336 | 7 | _ | |a Output Types/Journal article |2 DataCite |
336 | 7 | _ | |a Journal Article |b journal |m journal |0 PUB:(DE-HGF)16 |s 1736774242_23768 |2 PUB:(DE-HGF) |
336 | 7 | _ | |a ARTICLE |2 BibTeX |
336 | 7 | _ | |a JOURNAL_ARTICLE |2 ORCID |
336 | 7 | _ | |a Journal Article |0 0 |2 EndNote |
520 | _ | _ | |a With the increase of complex scientific simulations driven by workflows and heterogeneous workload profiles, managing system resources effectively is essential for improving performance and system throughput, especially due to trends like heterogeneous HPC and deeply integrated systems with on-chip accelerators. For optimal resource utilization, dynamic resource allocation can improve productivity across all system and application levels, by adapting the applications’ configurations to the system's resources. In this context, malleable jobs, which can change resources at runtime, can increase the system throughput and resource utilization while bringing various advantages for HPC users (e.g., shorter waiting time). Malleability has received much attention recently, even though it has been an active research area for more than two decades. This article presents the state-of-the-art of malleable implementations in HPC systems, targeting mainly malleability in compute and I/O resources. Based on our experiences, we state our current concerns and list future opportunities for research. |
536 | _ | _ | |a 5122 - Future Computing & Big Data Systems (POF4-512) |0 G:(DE-HGF)POF4-5122 |c POF4-512 |f POF IV |x 0 |
536 | _ | _ | |a DEEP-SEA - DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES (955606) |0 G:(EU-Grant)955606 |c 955606 |f H2020-JTI-EuroHPC-2019-1 |x 1 |
536 | _ | _ | |a ADMIRE - Adaptive multi-tier intelligent data manager for Exascale (956748) |0 G:(EU-Grant)956748 |c 956748 |f H2020-JTI-EuroHPC-2019-1 |x 2 |
536 | _ | _ | |a TIME-X - TIME parallelisation: for eXascale computing and beyond (955701) |0 G:(EU-Grant)955701 |c 955701 |f H2020-JTI-EuroHPC-2019-1 |x 3 |
536 | _ | _ | |a Verbundprojekt: TIME-X - Parallelisierung zeitabhängiger Simulationen für das zukünftige Supercomputing (16HPC047) |0 G:(BMBF)16HPC047 |c 16HPC047 |x 4 |
536 | _ | _ | |a REGALE - An open architecture to equip next generation HPC applications with exascale capabilities (956560) |0 G:(EU-Grant)956560 |c 956560 |f H2020-JTI-EuroHPC-2019-1 |x 5 |
588 | _ | _ | |a Dataset connected to CrossRef, Journals: juser.fz-juelich.de |
700 | 1 | _ | |a Schreiber, Martin |0 0000-0002-2390-6716 |b 1 |
700 | 1 | _ | |a Cascajo, Alberto |0 0000-0001-5506-1431 |b 2 |
700 | 1 | _ | |a Besnard, Jean-Baptiste |0 0000-0001-6500-6786 |b 3 |
700 | 1 | _ | |a Vef, Marc-André |0 0000-0001-7398-3034 |b 4 |
700 | 1 | _ | |a Huber, Dominik |0 0000-0001-9696-9382 |b 5 |
700 | 1 | _ | |a Happ, Sonja |0 P:(DE-Juel1)194671 |b 6 |u fzj |
700 | 1 | _ | |a Brinkmann, André |0 0000-0003-3083-2775 |b 7 |
700 | 1 | _ | |a Singh, David E. |0 0000-0002-8125-0049 |b 8 |
700 | 1 | _ | |a Hoppe, Hans-Christian |0 P:(DE-Juel1)194562 |b 9 |
700 | 1 | _ | |a Miranda, Alberto |0 0000-0002-1386-628X |b 10 |
700 | 1 | _ | |a Peña, Antonio J. |0 0000-0002-3575-4617 |b 11 |
700 | 1 | _ | |a Machado, Rui |0 0009-0009-2759-2302 |b 12 |
700 | 1 | _ | |a Garcia-Gasulla, Marta |0 0000-0003-3682-9905 |b 13 |
700 | 1 | _ | |a Schulz, Martin |0 0000-0001-9013-435X |b 14 |
700 | 1 | _ | |a Carpenter, Paul |0 0000-0002-9392-0521 |b 15 |
700 | 1 | _ | |a Pickartz, Simon |0 P:(DE-Juel1)177796 |b 16 |
700 | 1 | _ | |a Rotaru, Tiberiu |0 0009-0000-8455-5553 |b 17 |
700 | 1 | _ | |a Iserte, Sergio |0 0000-0003-3654-7924 |b 18 |
700 | 1 | _ | |a Lopez, Victor |0 0000-0002-3113-9166 |b 19 |
700 | 1 | _ | |a Ejarque, Jorge |0 0000-0003-4725-5097 |b 20 |
700 | 1 | _ | |a Sirwani, Heena |0 0000-0002-5629-1957 |b 21 |
700 | 1 | _ | |a Carretero, Jesus |0 0000-0002-1413-4793 |b 22 |
700 | 1 | _ | |a Wolf, Felix |0 P:(DE-Juel1)132299 |b 23 |
773 | _ | _ | |a 10.1109/TPDS.2024.3406764 |g Vol. 35, no. 9, p. 1551 - 1564 |0 PERI:(DE-600)2027774-X |n 9 |p 1551 - 1564 |t IEEE transactions on parallel and distributed systems |v 35 |y 2024 |x 2161-9883 |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1035001/files/Malleability_in_Modern_HPC_Systems_Current_Experiences_Challenges_and_Future_Opportunities.pdf |y OpenAccess |
909 | C | O | |o oai:juser.fz-juelich.de:1035001 |p openaire |p open_access |p driver |p VDB |p ec_fundedresources |p dnbdelivery |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 6 |6 P:(DE-Juel1)194671 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 9 |6 P:(DE-Juel1)194562 |
910 | 1 | _ | |a External Institute |0 I:(DE-HGF)0 |k Extern |b 16 |6 P:(DE-Juel1)177796 |
913 | 1 | _ | |a DE-HGF |b Key Technologies |l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action |1 G:(DE-HGF)POF4-510 |0 G:(DE-HGF)POF4-512 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Supercomputing & Big Data Infrastructures |9 G:(DE-HGF)POF4-5122 |x 0 |
914 | 1 | _ | |y 2024 |
915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0160 |2 StatID |b Essential Science Indicators |d 2023-08-29 |
915 | _ | _ | |a Creative Commons Attribution CC BY 4.0 |0 LIC:(DE-HGF)CCBY4 |2 HGFVOC |
915 | _ | _ | |a WoS |0 StatID:(DE-HGF)0113 |2 StatID |b Science Citation Index Expanded |d 2023-08-29 |
915 | _ | _ | |a OpenAccess |0 StatID:(DE-HGF)0510 |2 StatID |
915 | _ | _ | |a JCR |0 StatID:(DE-HGF)0100 |2 StatID |b IEEE T PARALL DISTR : 2022 |d 2025-01-02 |
915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0200 |2 StatID |b SCOPUS |d 2025-01-02 |
915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0300 |2 StatID |b Medline |d 2025-01-02 |
915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0600 |2 StatID |b Ebsco Academic Search |d 2025-01-02 |
915 | _ | _ | |a Peer Review |0 StatID:(DE-HGF)0030 |2 StatID |b ASC |d 2025-01-02 |
915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0199 |2 StatID |b Clarivate Analytics Master Journal List |d 2025-01-02 |
915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)1160 |2 StatID |b Current Contents - Engineering, Computing and Technology |d 2025-01-02 |
915 | _ | _ | |a DBCoverage |0 StatID:(DE-HGF)0150 |2 StatID |b Web of Science Core Collection |d 2025-01-02 |
915 | _ | _ | |a IF >= 5 |0 StatID:(DE-HGF)9905 |2 StatID |b IEEE T PARALL DISTR : 2022 |d 2025-01-02 |
920 | 1 | _ | |0 I:(DE-Juel1)JSC-20090406 |k JSC |l Jülich Supercomputing Center |x 0 |
980 | 1 | _ | |a FullTexts |
980 | _ | _ | |a journal |
980 | _ | _ | |a VDB |
980 | _ | _ | |a UNRESTRICTED |
980 | _ | _ | |a I:(DE-Juel1)JSC-20090406 |
Library | Collection | CLSMajor | CLSMinor | Language | Author |
---|