Conference Presentation (Other) FZJ-2024-03087

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Reproducibility vs. computational efficiency on HPC systems



2024

Distribits: technologies for distributed data management, DüsseldorfDüsseldorf, Germany, 4 Apr 2024 - 4 Apr 20242024-04-042024-04-04

Abstract: HPC systems have particular hard- and software configurations that introduce specific challenges for the implementation of reproducible data processing workflows. The DataLad based 'FAIRly big workflow' allows for a separation of the compute environment from the processing pipeline enabling automatic reproducibility over systems. Yet, the sheer size of RAM and CPUs on HPC systems will allow for different ways to optimize compute jobs in contrast to compute clusters and certainly the average workstation/laptop. In this talk, I discuss general differences between HCP and more standard compute environments regarding necessary choices for the setup of processing pipelines to be reproducible. Among the main factors are the availability of RAM, local storage, inodes and wall clock time.


Contributing Institute(s):
  1. Gehirn & Verhalten (INM-7)
Research Program(s):
  1. 5254 - Neuroscientific Data Analytics and AI (POF4-525) (POF4-525)

Appears in the scientific report 2024
Click to display QR Code for this record

The record appears in these collections:
Document types > Presentations > Conference Presentations
Institute Collections > INM > INM-7
Workflow collections > Public records
Publications database

 Record created 2024-04-23, last modified 2024-05-06



Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)