Modular Science: Towards Online Multi Application Coordination on Inhomogeneous High Performance Computing and Neuromorphic Hardware Systems

Klijn, Wouter; Schenck, Wolram; Diaz, Sandra; Weyers, Benjamin; Peyser, Alexander; Morrison, Abigail
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Klijn:850819,
      author       = {Klijn, Wouter and Diaz, Sandra and Morrison, Abigail and
                      Schenck, Wolram and Weyers, Benjamin and Peyser, Alexander},
      title        = {{M}odular {S}cience: {T}owards {O}nline {M}ulti
                      {A}pplication {C}oordination on {I}nhomogeneous {H}igh
                      {P}erformance {C}omputing and {N}euromorphic {H}ardware
                      {S}ystems},
      reportid     = {FZJ-2018-04590},
      year         = {2018},
      abstract     = {Neuroscience is an interdisciplinary field with
                      collaborators from biology, medicine, physics, mathematics,
                      computer scientists and engineers. The complexity of the
                      brain indicates that only with the collaboration and
                      integration of knowledge from diverse fields will we be able
                      to gain significant traction for a better understanding of
                      its structure and function. This reality also makes
                      neuroscience a place where specialized tools to solve
                      computationally demanding problems at different scales play
                      a fundamental role. Supercomputers are currently an
                      important tool for simulation and data processing in most
                      areas of science. The need to study the brain using multiple
                      specialized software is giving rise to complex workflows
                      which also require expert monitoring and interaction.
                      High-performance computing (HPC) workflows in neuroscience
                      are already important in several subdomains, including:1.
                      Image processing: requiring big data processing/storing,
                      automatic brain tissue segmentation, identification,quality
                      control and reconstruction.2. Brain simulation: from
                      molecules to neurons, networks and brain regions,
                      simulations of the brain constitute a source for the
                      analysis and generation of new hypothesis regarding the
                      roles of connectivity, topology, morphology and
                      communication in function and cognition.3. Visualization:
                      the translation from data to information is one of the most
                      important steps in a workflow,as it allows the scientist to
                      quickly assess the success of the analysis.4. Data storage:
                      from high resolution images to millions of spiking events
                      produced per second, data storage is one of the biggest
                      bottlenecks to solve in HPC neuroscientific workflows.5.
                      Large parameter space exploration: our models are imperfect,
                      the amount of experimental data that we have access to for
                      the brain is too small to constrain most models, and in
                      general deriving parameters for dynamical systems is
                      computationally intractable. This forces us to make large
                      parameter searches in order to fit our models to what we
                      measure in the laboratories. Parameter spaces can be so
                      large that a search for meaningful combinations becomes
                      intractable, requiring adaptive and efficient algorithms to
                      guide these explorations. Interactive monitoring by an
                      expert may also be desirable in many of these complex
                      searches. Such interactive supercomputing [1] enables
                      scientists to gain more insight on the impact that each
                      element in the model has on the observed outcome — and is
                      a goal of both the Human Brain Project and exascale
                      computing plans around the world. The live execution of
                      workflow pipelines is desirable due to the large amount of
                      intermediate generated data with inefficient or even
                      intractable storage requirements, the need for expert human
                      graphical interaction with the systems and interaction with
                      multiscale systems working at different time scales or with
                      systems in contact with experiments.Launching applications
                      on HPC systems such as clusters and supercomputers requires
                      interaction with scheduling systems, setting up software and
                      configuring working environments. A successful deployment of
                      a job in a supercomputer (the execution of a defined set of
                      instructions) depends on the accurate definition of paths to
                      libraries, access to data input/output, availability of
                      required computational resources, correct definition of the
                      job within the limits imposed by the scheduler and the
                      correct execution of the instructions enclosed in the
                      job.Each of these dependencies can be a source of problems
                      which prevent the job execution, requiring the application
                      neuroscientist to debug the pipeline. If this job is now
                      part of a complex workflow with increasing numbers of
                      software and hardware components, the dependencies multiply
                      and the potential for failure increases.In order to provide
                      the scientific community new tools to allow the reliable and
                      efficient execution of complex and interactive workflows, we
                      have conceptualized “Modular Science”. The Modular
                      Science workflow is a software and social interaction
                      contract for the deployment of complex scientific workflows
                      on supercomputers. It can be seen as an orchestrator for
                      scientific applications. First, it is a software contract
                      because it defines the interfaces must be packed, described
                      and shared between different steps and it defines the
                      operative limitations of each job. On the other hand it is
                      also a social contract because it helps scientists and
                      engineers agree on formats,infrastructure, environment
                      setups, limitations and desired behavior of these workflows.
                      It is not trivial to orchestrate the harmonious execution of
                      HPC software which usually was developed by different
                      partners and has multiple applications. The modular science
                      orchestrator (a software component, Figure 1) serves as a
                      base from which each scientific domain can develop
                      agreements on how to better work and exploit the available
                      computational resources, minimizing the risk of failure on
                      these complex workflows and enabling new science. The nature
                      of the framework also allows the monitoring of basic
                      variables within these workflows,such as data bandwidth,
                      memory consumption and error tracking through consolidated
                      logging. A diagram of the relationships between different
                      applications and the modular science orchestrator can be
                      seen in Figure 1. The modular science orchestrator considers
                      the execution of a scientific workflow as a staged process
                      in order to enhance its robustness and probability of
                      success with minimal effort and loss of resources. For this
                      purpose the orchestrator, making use of envelope software
                      which adheres and interacts to each job in the workflow,
                      tests the critical dependencies for execution of each job
                      independently before deployment. Then, it tests for critical
                      shared dependencies among two or more elements in the
                      workflow. It tests for connection channels, software
                      libraries,input/output paths and sources, privileges and
                      correct job configuration according to the limitations
                      imposed bythe specific scheduler in the supercomputer of
                      choice.Once these dependencies have been tested in a single
                      node, a second stage with all required resources allocated
                      starts. Again dependencies are verified for large scale
                      deployment and a final green light is given for the
                      execution of the full workflow. If something is not right in
                      the test stages, the deployment is canceled, notifying the
                      user and saving computing resources. This concept is
                      illustrated in Figure 1 for two applications in a
                      workflow.In this work we present the Modular Science
                      framework and the concrete set of use cases which are
                      guiding its development. A generic mapping of these
                      workflows into our framework can be seen in Figure 3. Our
                      first use case includes the interactive generation of neural
                      network models using connectivity based on experimental data
                      and executed on the NEST [2] simulator. Our second use case
                      considers the setup of a simulation executed in NEST,
                      interacting with a simulation in Arbor [3]. The output is
                      processed in Elephant [4] and also used to calculate local
                      field potentials [5]. The third use case encompasses the
                      generation of a full brain simulation using TVB [6] neural
                      mass models, based on connectivity from DTI
                      experiments.Simulation results must be compared against
                      functional experimental data to iteratively refine the
                      model. The process is observed by an expert who
                      interactively controls the parameter optimization process by
                      observing the results and evaluating the fitness of each
                      simulation instance.In our fourth use case, we plan to
                      enable a full brain simulation using neural mass models at
                      the global scope but which is coupled to local
                      representations of specific regions simulated in NEST.
                      Firing rate output from the neural mass models is translated
                      into spiking input for the detailed neuron scale
                      simulations. The output of the NEST simulation is afterwards
                      processed using Elephant for further analysis.The
                      development of the Modular Science framework is at an early
                      stage; here we present results using the initial proof of
                      concept in a multiscale simulation workflow as deployed on
                      the Jülich Supercomputing Centre’s infrastructure. Our
                      framework is open source and deployable in architectures
                      from local laboratory clusters to supercomputing centers and
                      compatible with most scheduling systems. By doing this, we
                      will not only benefit the neuroscience community but, with
                      little effort, our framework can be also used in other
                      fields. This framework will support the reproducibility of
                      complex scientific workflows and a more robust but efficient
                      usage of available HPC and data storage resources.We aim at
                      providing the neuroscientific community a new way to
                      interact, share and exploit the capacity of specialized
                      software ON HPC in a consistent framework. This framework
                      will support the reproducibility of large workflows and
                      robust but efficient usage of HPC and data storage
                      resources. Such a framework will be crucial to attack new,
                      large scale neuroscientific problems requiring
                      complex/multiscale workflows combining pluggable simulators
                      and analytical tools. References[1] Thomas Lippert and Boris
                      Orth. Supercomputing infrastructure for simulations of the
                      human brain.},
      month         = {Jul},
      date          = {2018-07-13},
      organization  = {27th Annual Computational Neuroscience
                       Meeting, Seattle (USA), 13 Jul 2018 -
                       18 Jul 2018},
      subtyp        = {After Call},
      cin          = {JSC / JARA-HPC / INM-6},
      cid          = {I:(DE-Juel1)JSC-20090406 / $I:(DE-82)080012_20140620$ /
                      I:(DE-Juel1)INM-6-20090406},
      pnm          = {511 - Computational Science and Mathematical Methods
                      (POF3-511) / SMHB - Supercomputing and Modelling for the
                      Human Brain (HGF-SMHB-2013-2017) / SLNS - SimLab
                      Neuroscience (Helmholtz-SLNS)},
      pid          = {G:(DE-HGF)POF3-511 / G:(DE-Juel1)HGF-SMHB-2013-2017 /
                      G:(DE-Juel1)Helmholtz-SLNS},
      typ          = {PUB:(DE-HGF)24},
      url          = {https://juser.fz-juelich.de/record/850819},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help