TY - CONF AU - Hoffstaedter, Felix TI - Reproducibility vs. computational efficiency on HPC systems M1 - FZJ-2024-03087 PY - 2024 AB - HPC systems have particular hard- and software configurations that introduce specific challenges for the implementation of reproducible data processing workflows. The DataLad based 'FAIRly big workflow' allows for a separation of the compute environment from the processing pipeline enabling automatic reproducibility over systems. Yet, the sheer size of RAM and CPUs on HPC systems will allow for different ways to optimize compute jobs in contrast to compute clusters and certainly the average workstation/laptop. In this talk, I discuss general differences between HCP and more standard compute environments regarding necessary choices for the setup of processing pipelines to be reproducible. Among the main factors are the availability of RAM, local storage, inodes and wall clock time. T2 - Distribits: technologies for distributed data management CY - 4 Apr 2024 - 4 Apr 2024, Düsseldorf (Germany) Y2 - 4 Apr 2024 - 4 Apr 2024 M2 - Düsseldorf, Germany LB - PUB:(DE-HGF)6 UR - https://juser.fz-juelich.de/record/1025704 ER -