Home > Publications database > Tracking the provenance of data generation and analysis in NEST simulations |
Poster (After Call) | FZJ-2024-05772 |
; ; ;
2024
This record in other databases:
Please use a persistent id in citations: doi:10.12751/NNCN.BC2024.031
Abstract: Neural simulations using NEST are typically executed by a Python script that configures the simulator kernel, builds the network, and runs the simulation. The result is a series of files containing the simulated network activity, which can then be analyzed to provide insights into the neural activity. Despite the availability of file headers to identify the origin of the outputs, a user analyzing the data must still interpret the findings with respect to the simulation setup, network connectivity, and parameters of the neuronal and synaptic models. This information is not immediately available, as the exact details of the simulation configuration are understandable only by referring to the original script, which makes it challenging to share simulation results, especially in collaborative contexts. In addition, the researcher may change simulation parameters over time, and tracking those changes becomes increasingly difficult among collaborators with access to shared files with the simulation output. Therefore, the final results of a NEST simulation lack detailed provenance to link each output to the detailed description of how the network was instantiated and run.Here we showcase how Alpaca (doi:10.5281/zenodo.10276510; RRID:SCR_023739) [1] helps to capture provenance in a typical NEST simulation experiment and subsequent data analysis with the Elephant (doi:10.5281/zenodo.1186602; RRID:SCR_003833) toolbox [2]. Alpaca is a toolbox that captures provenance during the execution of Python scripts. It uses decorators to record the details of each function executed and associated data objects. First, we demonstrate that Alpaca can capture end-to-end provenance in a workflow that executes multiple simulations with distinct parameters and performs a combined analysis of all generated data. Second, we highlight how data objects are annotated with simulation details using the Neo library [3] to identify the data source in the simulation. Third, we show how the details of the network creation using the PyNEST interface are captured and related to each data output and analysis result. In the end, this approach contributes to representing the simulated data and analysis results according to the FAIR principles [4]. The results findability is improved with the detailed provenance, the interoperability is supported by a standardized data model, and data may be reused due to the enhanced description of the data generation and analysis processes. REFERENCES [1] Köhler, C.A., Ulianych, D., Grün, S., Decker, S., Denker, M., 2024. Facilitating the sharing of electrophysiology data analysis results through in-depth provenance capture. eNeuro 11, ENEURO.0476-23.2024, 10.1523/ENEURO.0476-23.2024[2] Denker, M., Yegenoglu, A., Grün, S., 2018. Collaborative HPC-enabled workflows on the HBP Collaboratory using the Elephant framework. Neuroinformatics 2018, P19, 10.12751/incf.ni2018.0019[3] Garcia, S., Guarino, D., Jaillet, F., Jennings, T., Pröpper, R., Rautenberg, P.L., Rodgers, C.C., Sobolev, A., Wachtler, T., Yger, P., Davison, A.P., 2014. Neo: an object model for handling electrophysiology data in multiple formats. Frontiers in Neuroinformatics 8, 10, 10.3389/fninf.2014.00010[4] Wilkinson, M.D., Dumontier, M., Aalbersberg, Ij.J., Appleton, G., Axton, M., Baak, A., Blomberg, N. et al., 2016. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data 3, 160018, 10.1038/sdata.2016.18
Keyword(s): Computational Neuroscience ; Data analysis, machine learning and neuroinformatics
![]() |
The record appears in these collections: |