%0 Conference Paper
%A Köhler, Cristiano
%A Ulianych, Danylo
%A Gerkin, Richard C.
%A Davison, Andrew P.
%A Grün, Sonja
%A Denker, Michael
%T Provenance capture in the analysis of electrophysiology data: an example based on the Elephant package
%M FZJ-2020-03964
%D 2020
%X Workflows for the analysis of electrophysiology data typically comprise multiple steps, which should be fully documented when aiming at the reproducibility of the results. Considering the complexity, modularity and often iterative nature of such workflows, robust tools forming the basis of the workflow are necessary [1]. We focus here on two open-source tools used for the analysis of electrophysiology data. The Neo (RRID:SCR_000634) framework provides a data object model to standardize  data of different origins [2]. Elephant (RRID:SCR_003833) is a toolbox for both standard and highly sophisticated analyses of simulated and experimental data [3]. The characterization of all data manipulations and the parameters throughout the workflow provides provenance information [4] that improves reproducibility of the results. This requires complete and self-explanatory descriptions of the data objects in the workflow and a method to minimize the need for manually tracking its execution. While the Neo framework provides a model to structure the neuronal data and associated metadata, a similar representation for the outputs of the analysis part of the workflow is still missing. Moreover, automated provenance capture is not available at the function level for a single Python script. Thus, existing tools must be improved to implement a data model that captures analysis outputs and workflow provenance and, ultimately, represents the analysis and its results in accordance with the FAIR principles [5].Here we present a conceptual solution to capture provenance during the analysis of electrophysiology data. First, we introduce a standardization of the outputs of the Elephant functions, which is inspired by the Neo model. Thus, the information about the generation of an analysis output will be encapsulated in a new set of Python objects that can be easily re-used or shared. These objects will be integrated into the existing code bases with minimal disruption. This will free the scientist from the need to manually annotate the output of the analysis. Second, we will show how to capture provenance information throughout the Python analysis script by using function decorators. These track the Elephant and user-defined functions in the script while mapping the inputs to the outputs, thereby also yielding a provenance trace in the form of a graph. We present a prototype implementation and demonstrate its use in a scenario where spike and LFP data are analyzed by standard methods. References: [1] Denker, M. and Grün, S. (2016). Designing Workflows for the Reproducible Analysis of Electrophysiological Data. In Brain-Inspired Computing, Amunts, K. et al., eds. (Cham: Springer International Publishing), pp. 58-72. [2] Garcia, S. et al. (2014) Neo: an object model for handling electrophysiology data in multiple formats.  Frontiers in Neuroinformatics 8:10. [3] http://python-elephant.org [4] Ragan, E.D. et al. (2016). Characterizing Provenance in Visualization and Data Analysis: An Organizational Framework of Provenance Types and Purposes. IEEE Transactions on Visualization and Computer Graphics. 22(1):31–40. [5] Wilkinson, M.D. et al. (2016). The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018.
%B Online Bernstein Conference 2020
%C 29 Sep 2020 - 1 Oct 2020, online (Germany)
Y2 29 Sep 2020 - 1 Oct 2020
M2 online, Germany
%F PUB:(DE-HGF)24
%9 Poster
%R 10.12751/NNCN.BC2020.0098
%U https://juser.fz-juelich.de/record/885612