TY - EJOUR
AU - Kelling, Jeffrey
AU - Bolea, Vicente
AU - Bussmann, Michael
AU - Checkervarty, Ankush
AU - Debus, Alexander
AU - Ebert, Jan
AU - Eisenhauer, Greg
AU - Gutta, Vineeth
AU - Kesselheim, Stefan
AU - Klasky, Scott
AU - Pandit, Vedhas
AU - Pausch, Richard
AU - Podhorszki, Norbert
AU - Poschel, Franz
AU - Rogers, David
AU - Rustamov, Jeyhun
AU - Schmerler, Steve
AU - Schramm, Ulrich
AU - Steiniger, Klaus
AU - Widera, Rene
AU - Willmann, Anna
AU - Chandrasekaran, Sunita
TI - The Artificial Scientist -- in-transit Machine Learning of Plasma Simulations
PB - arXiv
M1 - FZJ-2026-01458
PY - 2025
AB - Increasing HPC cluster sizes and large-scale simulations that produce petabytes of data per run, create massive IO and storage challenges for analysis. Deep learning-based techniques, in particular, make use of these amounts of domain data to extract patterns that help build scientific understanding. Here, we demonstrate a streaming workflow in which simulation data is streamed directly to a machine-learning (ML) framework, circumventing the file system bottleneck. Data is transformed in transit, asynchronously to the simulation and the training of the model. With the presented workflow, data operations can be performed in common and easy-to-use programming languages, freeing the application user from adapting the application output routines. As a proof-of-concept we consider a GPU accelerated particle-in-cell (PIConGPU) simulation of the Kelvin- Helmholtz instability (KHI). We employ experience replay to avoid catastrophic forgetting in learning from this non-steady process in a continual manner. We detail challenges addressed while porting and scaling to Frontier exascale system.
KW - Computational Physics (physics.comp-ph) (Other)
KW - Distributed, Parallel, and Cluster Computing (cs.DC) (Other)
KW - Machine Learning (cs.LG) (Other)
KW - FOS: Physical sciences (Other)
KW - FOS: Computer and information sciences (Other)
LB - PUB:(DE-HGF)25
DO - DOI:10.48550/ARXIV.2501.03383
UR - https://juser.fz-juelich.de/record/1053125
ER -