Journal Article FZJ-2025-02769

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Metadata practices for simulation workflows

 ;  ;  ;  ;  ;  ;

2025
Nature Publ. Group London

Scientific data 12, 942 () [10.1038/s41597-025-05126-1]

This record in other databases:      

Please use a persistent id in citations: this repositories doi:  doi:  doi:

Abstract: Computer simulations are an essential pillar of knowledge generation in science. Exploring, understanding, reproducing, and sharing the results of simulations relies on tracking and organizing the metadata describing the numerical experiments. The models used to understand real-world systems, and the computational machinery required to simulate them, are typically complex, and produce large amounts of heterogeneous metadata. Here, we present general practices for acquiring and handling metadata that are agnostic to software and hardware, and highly flexible for the user. These consist of two steps: 1) recording and storing raw metadata, and 2) selecting and structuring metadata. As a proof of concept, we develop the Archivist, a Python tool to help with the second step, and use it to apply our practices to distinct high-performance computing use cases from neuroscience and hydrology. Our practices and the Archivist can readily be applied to existing workflows without the need for substantial restructuring. They support sustainable numerical workflows, fostering replicability, reproducibility, data exploration, and data sharing in simulation-based research.

Keyword(s): Information Retrieval (cs.IR) ; FOS: Computer and information sciences

Classification:

Contributing Institute(s):
  1. Computational and Systems Neuroscience (IAS-6)
  2. Materials Data Science and Informatics (IAS-9)
Research Program(s):
  1. 5232 - Computational Principles (POF4-523) (POF4-523)
  2. 5235 - Digitization of Neuroscience and User-Community Building (POF4-523) (POF4-523)
  3. MetaMoSim - Generic metadata management for reproducible high-performance-computing simulation workflows - MetaMoSim (ZT-I-PF-3-026) (ZT-I-PF-3-026)
  4. HiRSE - Helmholtz Platform for Research Software Engineering (HiRSE-20250220) (HiRSE-20250220)
  5. Advanced Computing Architectures (aca_20190115) (aca_20190115)
  6. EBRAINS 2.0 - EBRAINS 2.0: A Research Infrastructure to Advance Neuroscience and Brain Health (101147319) (101147319)
  7. Brain-Scale Simulations (jinb33_20220812) (jinb33_20220812)
  8. ICEI - Interactive Computing E-Infrastructure for the Human Brain Project (800858) (800858)
  9. JL SMHB - Joint Lab Supercomputing and Modeling for the Human Brain (JL SMHB-2021-2027) (JL SMHB-2021-2027)
  10. DFG project G:(GEPRIS)491111487 - Open-Access-Publikationskosten / 2025 - 2027 / Forschungszentrum Jülich (OAPKFZJ) (491111487) (491111487)

Appears in the scientific report 2025
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; DOAJ ; OpenAccess ; Article Processing Charges ; BIOSIS Previews ; Biological Abstracts ; Clarivate Analytics Master Journal List ; DOAJ Seal ; Essential Science Indicators ; Fees ; IF >= 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Dokumenttypen > Aufsätze > Zeitschriftenaufsätze
Institutssammlungen > IAS > IAS-9
Institutssammlungen > IAS > IAS-6
Workflowsammlungen > Öffentliche Einträge
Publikationsdatenbank
Open Access

 Datensatz erzeugt am 2025-06-10, letzte Änderung am 2025-08-04


OpenAccess:
Volltext herunterladen PDF
Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)