% IMPORTANT: The following is UTF-8 encoded. This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.
@INPROCEEDINGS{Klijn:850819,
author = {Klijn, Wouter and Diaz, Sandra and Morrison, Abigail and
Schenck, Wolram and Weyers, Benjamin and Peyser, Alexander},
title = {{M}odular {S}cience: {T}owards {O}nline {M}ulti
{A}pplication {C}oordination on {I}nhomogeneous {H}igh
{P}erformance {C}omputing and {N}euromorphic {H}ardware
{S}ystems},
reportid = {FZJ-2018-04590},
year = {2018},
abstract = {Neuroscience is an interdisciplinary field with
collaborators from biology, medicine, physics, mathematics,
computer scientists and engineers. The complexity of the
brain indicates that only with the collaboration and
integration of knowledge from diverse fields will we be able
to gain significant traction for a better understanding of
its structure and function. This reality also makes
neuroscience a place where specialized tools to solve
computationally demanding problems at different scales play
a fundamental role. Supercomputers are currently an
important tool for simulation and data processing in most
areas of science. The need to study the brain using multiple
specialized software is giving rise to complex workflows
which also require expert monitoring and interaction.
High-performance computing (HPC) workflows in neuroscience
are already important in several subdomains, including:1.
Image processing: requiring big data processing/storing,
automatic brain tissue segmentation, identification,quality
control and reconstruction.2. Brain simulation: from
molecules to neurons, networks and brain regions,
simulations of the brain constitute a source for the
analysis and generation of new hypothesis regarding the
roles of connectivity, topology, morphology and
communication in function and cognition.3. Visualization:
the translation from data to information is one of the most
important steps in a workflow,as it allows the scientist to
quickly assess the success of the analysis.4. Data storage:
from high resolution images to millions of spiking events
produced per second, data storage is one of the biggest
bottlenecks to solve in HPC neuroscientific workflows.5.
Large parameter space exploration: our models are imperfect,
the amount of experimental data that we have access to for
the brain is too small to constrain most models, and in
general deriving parameters for dynamical systems is
computationally intractable. This forces us to make large
parameter searches in order to fit our models to what we
measure in the laboratories. Parameter spaces can be so
large that a search for meaningful combinations becomes
intractable, requiring adaptive and efficient algorithms to
guide these explorations. Interactive monitoring by an
expert may also be desirable in many of these complex
searches. Such interactive supercomputing [1] enables
scientists to gain more insight on the impact that each
element in the model has on the observed outcome — and is
a goal of both the Human Brain Project and exascale
computing plans around the world. The live execution of
workflow pipelines is desirable due to the large amount of
intermediate generated data with inefficient or even
intractable storage requirements, the need for expert human
graphical interaction with the systems and interaction with
multiscale systems working at different time scales or with
systems in contact with experiments.Launching applications
on HPC systems such as clusters and supercomputers requires
interaction with scheduling systems, setting up software and
configuring working environments. A successful deployment of
a job in a supercomputer (the execution of a defined set of
instructions) depends on the accurate definition of paths to
libraries, access to data input/output, availability of
required computational resources, correct definition of the
job within the limits imposed by the scheduler and the
correct execution of the instructions enclosed in the
job.Each of these dependencies can be a source of problems
which prevent the job execution, requiring the application
neuroscientist to debug the pipeline. If this job is now
part of a complex workflow with increasing numbers of
software and hardware components, the dependencies multiply
and the potential for failure increases.In order to provide
the scientific community new tools to allow the reliable and
efficient execution of complex and interactive workflows, we
have conceptualized “Modular Science”. The Modular
Science workflow is a software and social interaction
contract for the deployment of complex scientific workflows
on supercomputers. It can be seen as an orchestrator for
scientific applications. First, it is a software contract
because it defines the interfaces must be packed, described
and shared between different steps and it defines the
operative limitations of each job. On the other hand it is
also a social contract because it helps scientists and
engineers agree on formats,infrastructure, environment
setups, limitations and desired behavior of these workflows.
It is not trivial to orchestrate the harmonious execution of
HPC software which usually was developed by different
partners and has multiple applications. The modular science
orchestrator (a software component, Figure 1) serves as a
base from which each scientific domain can develop
agreements on how to better work and exploit the available
computational resources, minimizing the risk of failure on
these complex workflows and enabling new science. The nature
of the framework also allows the monitoring of basic
variables within these workflows,such as data bandwidth,
memory consumption and error tracking through consolidated
logging. A diagram of the relationships between different
applications and the modular science orchestrator can be
seen in Figure 1. The modular science orchestrator considers
the execution of a scientific workflow as a staged process
in order to enhance its robustness and probability of
success with minimal effort and loss of resources. For this
purpose the orchestrator, making use of envelope software
which adheres and interacts to each job in the workflow,
tests the critical dependencies for execution of each job
independently before deployment. Then, it tests for critical
shared dependencies among two or more elements in the
workflow. It tests for connection channels, software
libraries,input/output paths and sources, privileges and
correct job configuration according to the limitations
imposed bythe specific scheduler in the supercomputer of
choice.Once these dependencies have been tested in a single
node, a second stage with all required resources allocated
starts. Again dependencies are verified for large scale
deployment and a final green light is given for the
execution of the full workflow. If something is not right in
the test stages, the deployment is canceled, notifying the
user and saving computing resources. This concept is
illustrated in Figure 1 for two applications in a
workflow.In this work we present the Modular Science
framework and the concrete set of use cases which are
guiding its development. A generic mapping of these
workflows into our framework can be seen in Figure 3. Our
first use case includes the interactive generation of neural
network models using connectivity based on experimental data
and executed on the NEST [2] simulator. Our second use case
considers the setup of a simulation executed in NEST,
interacting with a simulation in Arbor [3]. The output is
processed in Elephant [4] and also used to calculate local
field potentials [5]. The third use case encompasses the
generation of a full brain simulation using TVB [6] neural
mass models, based on connectivity from DTI
experiments.Simulation results must be compared against
functional experimental data to iteratively refine the
model. The process is observed by an expert who
interactively controls the parameter optimization process by
observing the results and evaluating the fitness of each
simulation instance.In our fourth use case, we plan to
enable a full brain simulation using neural mass models at
the global scope but which is coupled to local
representations of specific regions simulated in NEST.
Firing rate output from the neural mass models is translated
into spiking input for the detailed neuron scale
simulations. The output of the NEST simulation is afterwards
processed using Elephant for further analysis.The
development of the Modular Science framework is at an early
stage; here we present results using the initial proof of
concept in a multiscale simulation workflow as deployed on
the Jülich Supercomputing Centre’s infrastructure. Our
framework is open source and deployable in architectures
from local laboratory clusters to supercomputing centers and
compatible with most scheduling systems. By doing this, we
will not only benefit the neuroscience community but, with
little effort, our framework can be also used in other
fields. This framework will support the reproducibility of
complex scientific workflows and a more robust but efficient
usage of available HPC and data storage resources.We aim at
providing the neuroscientific community a new way to
interact, share and exploit the capacity of specialized
software ON HPC in a consistent framework. This framework
will support the reproducibility of large workflows and
robust but efficient usage of HPC and data storage
resources. Such a framework will be crucial to attack new,
large scale neuroscientific problems requiring
complex/multiscale workflows combining pluggable simulators
and analytical tools. References[1] Thomas Lippert and Boris
Orth. Supercomputing infrastructure for simulations of the
human brain.},
month = {Jul},
date = {2018-07-13},
organization = {27th Annual Computational Neuroscience
Meeting, Seattle (USA), 13 Jul 2018 -
18 Jul 2018},
subtyp = {After Call},
cin = {JSC / JARA-HPC / INM-6},
cid = {I:(DE-Juel1)JSC-20090406 / $I:(DE-82)080012_20140620$ /
I:(DE-Juel1)INM-6-20090406},
pnm = {511 - Computational Science and Mathematical Methods
(POF3-511) / SMHB - Supercomputing and Modelling for the
Human Brain (HGF-SMHB-2013-2017) / SLNS - SimLab
Neuroscience (Helmholtz-SLNS)},
pid = {G:(DE-HGF)POF3-511 / G:(DE-Juel1)HGF-SMHB-2013-2017 /
G:(DE-Juel1)Helmholtz-SLNS},
typ = {PUB:(DE-HGF)24},
url = {https://juser.fz-juelich.de/record/850819},
}