Home > Publications database > JuMonC: A RESTful tool for enabling monitoring and control of simulations at scale |
Journal Article | FZJ-2024-06508 |
; ; ; ; ; ;
2025
Elsevier Science
Amsterdam [u.a.]
This record in other databases:
Please use a persistent id in citations: doi:10.1016/j.future.2024.107541 doi:10.34734/FZJ-2024-06508
Abstract: As systems and simulations grow in size and complexity, it is challenging to maintain efficient use of resources and avoid failures. In this scenario, monitoring becomes even more important and mandatory. This paper describes and discusses the benefits of the advanced monitoring and control tool JuMonC, which runs under user control alongside HPC simulations and provides valuable metrics via REST-API. In addition, plugin extensibility allows JuMonC to go a step further and provide computational steering of the simulation itself. To demonstrate the benefits and usability of JuMonC for large-scale simulations, two use cases are described employing nekRS and ICON on JURECA-DC, a supercomputer located at the Jülich Supercomputing Centre (JSC). Furthermore, a large-scale use case with nekRS on JSC’s flagship system JUWELS Booster is described. Finally, the interplay between JuMonC and LLview (a standard monitoring tool for HPC systems) is presented using a simple and secure JuMonC-LLview plugin, which collects performance metrics and enables their analysis in LLview. Overall, the portability and usefulness of JuMonC, together with its low performance impact, make it an important application for both current and future generations of exascale HPC systems.
![]() |
The record appears in these collections: |