Capturing detailed provenance information in the analysis of electrophysiology data

Köhler, Cristiano; Denker, Michael; Ulianych, Danylo; Grün, Sonja; Gerkin, Richard C.; Davison, Andrew P.

TY - CONF
AU - Köhler, Cristiano
AU - Ulianych, Danylo
AU - Gerkin, Richard C.
AU - Davison, Andrew P.
AU - Grün, Sonja
AU - Denker, Michael
TI - Capturing detailed provenance information in the analysis of electrophysiology data
M1 - FZJ-2021-02267
PY - 2021
AB - The analysis of electrophysiology data typically comprises multiple steps. These often consist of several scripts executed in a specific sequence that take different parameter sets and use distinct data files. As the researcher adjusts the individual analysis steps to accommodate new hypotheses or additional data, the resulting workflows may become increasingly complex, and undergo frequent changes. Therefore, robust tools forming the workflows are necessary to fully document the workflow and improve the reproducibility of the results. Provenance refers to the characterization of data manipulations and corresponding parameters throughout the analysis [1]. It is possible to use workflow management systems to orchestrate the execution of the scripts and capture provenance information at the level of the script (i.e., which script file was executed, and in which environment?) and data file (i.e., which input and output files were supplied to that script). However, the resulting provenance track does not automatically provide details about the actual analysis carried out inside each script. Thus, analysis results can only be understood by source code inspection or trust in the correctness of any accompanying documentation. Here, we aim to improve existing tools by implementing a data model that captures detailed provenance information and by accurately representing the analysis results in a systematic and formalized manner. We focus on two open-source tools for the analysis of electrophysiology data. The Neo (RRID:SCR_000634) framework provides an object model to standardize neural activity data acquired from distinct sources [2]. Elephant (RRID:SCR_003833) is a Python toolbox that provides several functions for the analysis of electrophysiology data.3 We implemented prototypes of two complementary solutions to extend the functionality of Neo and Elephant to (i) automatically capture provenance information at the function-execution level inside a Python script, and to (ii) support the standardization of the analysis results together with the storage of relevant information describing their generation. The first solution is a set of data analysis objects that standardize the output of Elephant functions. They encapsulate all relevant parameters used by the function to generate the output, such that they can be easily re-used or shared. The second solution maps function inputs, outputs, and parameters throughout the execution of the Python analysis script, and builds a representation of the relationships between the different steps of the analysis within the script (i.e., the provenance trace). The captured information can be used to build a graph to visualize the steps followed in the script, and that can be stored together with the results as metadata. We compare the results obtained with or without the use of the two solutions on the basis of a realistic analysis scenario of electrophysiology data, showing the potential benefits for reproducibility, interoperability, discoverability, and re-use of analysis results. References: [1] Ragan et al. (2016) IEEE Trans Visual Comput Graphics 22:31. [2] Garcia et al. (2014) Front Neuroinform 8:10. [3] http://python-elephant.org.
T2 - 14th Göttingen Meeting of the German Neuroscience Society 2021
CY - 22 Mar 2021 - 30 Mar 2021, online (Germany)
Y2 - 22 Mar 2021 - 30 Mar 2021
M2 - online, Germany
LB - PUB:(DE-HGF)24
UR - https://juser.fz-juelich.de/record/892689
ER -

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help