FAIRly big: A framework for computationally reproducible processing of large-scale data

Wagner, Adina S.; Poldrack, Benjamin; Hoffstaedter, Felix; Eickhoff, Simon B.; Waite, Alexander Q.; Wierzba, Małgorzata; Hanke, Michael; Waite, Laura K.
doi:10.1038/s41597-022-01163-2
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@ARTICLE{Wagner:906802,
      author       = {Wagner, Adina S. and Waite, Laura K. and Wierzba,
                      Małgorzata and Hoffstaedter, Felix and Waite, Alexander Q.
                      and Poldrack, Benjamin and Eickhoff, Simon B. and Hanke,
                      Michael},
      title        = {{FAIR}ly big: {A} framework for computationally
                      reproducible processing of large-scale data},
      journal      = {Scientific data},
      volume       = {9},
      number       = {1},
      issn         = {2052-4436},
      address      = {London},
      publisher    = {Nature Publ. Group},
      reportid     = {FZJ-2022-01700},
      pages        = {80},
      year         = {2022},
      abstract     = {Large-scale datasets present unique opportunities to
                      perform scientific investigations with unprecedented
                      breadth. However, they also pose considerable challenges for
                      the findability, accessibility, interoperability, and
                      reusability (FAIR) of research outcomes due to
                      infrastructure limitations, data usage constraints, or
                      software license restrictions. Here we introduce a
                      DataLad-based, domain-agnostic framework suitable for
                      reproducible data processing in compliance with open science
                      mandates. The framework attempts to minimize platform
                      idiosyncrasies and performance-related complexities. It
                      affords the capture of machine-actionable computational
                      provenance records that can be used to retrace and verify
                      the origins of research outcomes, as well as be re-executed
                      independent of the original computing infrastructure. We
                      demonstrate the framework's performance using two showcases:
                      one highlighting data sharing and transparency (using the
                      studyforrest.org dataset) and another highlighting
                      scalability (using the largest public brain imaging dataset
                      available: the UK Biobank dataset).},
      cin          = {INM-7},
      ddc          = {500},
      cid          = {I:(DE-Juel1)INM-7-20090406},
      pnm          = {5254 - Neuroscientific Data Analytics and AI (POF4-525)},
      pid          = {G:(DE-HGF)POF4-5254},
      typ          = {PUB:(DE-HGF)16},
      pubmed       = {pmid:35277501},
      UT           = {WOS:000767813100012},
      doi          = {10.1038/s41597-022-01163-2},
      url          = {https://juser.fz-juelich.de/record/906802},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help