TY  - JOUR
AU  - Witzler, Christian
AU  - Guimarães, Filipe Souza Mendes
AU  - Mira, Daniel
AU  - Anzt, Hartwig
AU  - Göbbert, Jens Henrik
AU  - Frings, Wolfgang
AU  - Bode, Mathis
TI  - JuMonC: A RESTful tool for enabling monitoring and control of simulations at scale
JO  - Future generation computer systems
VL  - 164
SN  - 0167-739X
CY  - Amsterdam [u.a.]
PB  - Elsevier Science
M1  - FZJ-2024-06508
SP  - 107541 -
PY  - 2025
AB  - As systems and simulations grow in size and complexity, it is challenging to maintain efficient use of resources and avoid failures. In this scenario, monitoring becomes even more important and mandatory. This paper describes and discusses the benefits of the advanced monitoring and control tool JuMonC, which runs under user control alongside HPC simulations and provides valuable metrics via REST-API. In addition, plugin extensibility allows JuMonC to go a step further and provide computational steering of the simulation itself. To demonstrate the benefits and usability of JuMonC for large-scale simulations, two use cases are described employing nekRS and ICON on JURECA-DC, a supercomputer located at the Jülich Supercomputing Centre (JSC). Furthermore, a large-scale use case with nekRS on JSC’s flagship system JUWELS Booster is described. Finally, the interplay between JuMonC and LLview (a standard monitoring tool for HPC systems) is presented using a simple and secure JuMonC-LLview plugin, which collects performance metrics and enables their analysis in LLview. Overall, the portability and usefulness of JuMonC, together with its low performance impact, make it an important application for both current and future generations of exascale HPC systems.
LB  - PUB:(DE-HGF)16
UR  - <Go to ISI:>//WOS:001358353300001
DO  - DOI:10.1016/j.future.2024.107541
UR  - https://juser.fz-juelich.de/record/1033636
ER  -