Journal Article FZJ-2024-06508

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
JuMonC: A RESTful tool for enabling monitoring and control of simulations at scale

 ;  ;  ;  ;  ;  ;

2025
Elsevier Science Amsterdam [u.a.]

Future generation computer systems 164, 107541 - () [10.1016/j.future.2024.107541]

This record in other databases:  

Please use a persistent id in citations: doi:  doi:

Abstract: As systems and simulations grow in size and complexity, it is challenging to maintain efficient use of resources and avoid failures. In this scenario, monitoring becomes even more important and mandatory. This paper describes and discusses the benefits of the advanced monitoring and control tool JuMonC, which runs under user control alongside HPC simulations and provides valuable metrics via REST-API. In addition, plugin extensibility allows JuMonC to go a step further and provide computational steering of the simulation itself. To demonstrate the benefits and usability of JuMonC for large-scale simulations, two use cases are described employing nekRS and ICON on JURECA-DC, a supercomputer located at the Jülich Supercomputing Centre (JSC). Furthermore, a large-scale use case with nekRS on JSC’s flagship system JUWELS Booster is described. Finally, the interplay between JuMonC and LLview (a standard monitoring tool for HPC systems) is presented using a simple and secure JuMonC-LLview plugin, which collects performance metrics and enables their analysis in LLview. Overall, the portability and usefulness of JuMonC, together with its low performance impact, make it an important application for both current and future generations of exascale HPC systems.

Classification:

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)
  2. CoEC - Center of Excellence in Combustion (952181) (952181)
  3. IO-SEA - IO Software for Exascale Architecture (955811) (955811)
  4. DEEP-SEA - DEEP – SOFTWARE FOR EXASCALE ARCHITECTURES (955606) (955606)
  5. JLESC - Joint Laboratory for Extreme Scale Computing (JLESC-20150708) (JLESC-20150708)
  6. ATMLAO - ATML Application Optimization and User Service Tools (ATMLAO) (ATMLAO)

Appears in the scientific report 2025
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; OpenAccess ; Clarivate Analytics Master Journal List ; Current Contents - Engineering, Computing and Technology ; Essential Science Indicators ; IF >= 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Workflow collections > Public records
Workflow collections > Publication Charges
Institute Collections > JSC
Publications database
Open Access

 Record created 2024-11-27, last modified 2025-03-17


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)