On the use of containers for machine learning and visualization workflows on JUWELS

Gong, Bing; Mozaffari, Amirpasha; Schultz, Martin; Vogelsang, Jan
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Gong:890154,
      author       = {Gong, Bing and Vogelsang, Jan and Mozaffari, Amirpasha and
                      Schultz, Martin},
      title        = {{O}n the use of containers for machine learning and
                      visualization workflows on {JUWELS}},
      reportid     = {FZJ-2021-00743},
      year         = {2020},
      abstract     = {Containers stock a single package of a code along with its
                      dependencies so it can run reliably and efficiently in
                      different computing environments. They promise the same
                      level of isolation and security as a virtual machine and a
                      higher degree of integration with the host operating system
                      (OS). The main benefits of containers are, from a user
                      perspective: greater software flexibility, reliability, ease
                      of deployment, and portability. Containers have become very
                      popular on cloud systems, but they have not been used much
                      in HPC environments. In this study, we have tested the use
                      of containers and measured the performance of the
                      containerized workflow for two separate applications in the
                      HPC system. In the first use case, we have automated the
                      visualization process of global wildfire activity and the
                      resulting “smoke” plumes from numerical model results of
                      the Copernicus Atmosphere Monitoring System
                      (https://www.ecmwf.int/en/about/what-we-do/environmental-services/copernicus-atmosphere-monitoring-service).
                      The motivation for this workflow was to expedite the process
                      of visualizing new fire situations without having to engage
                      several people along the workflow from data extraction, data
                      transformations, and the actual visualisation. Once, a
                      container workflow is defined for this application, it can
                      be easily adapted to work with other model variables, time
                      periods, etc. Therefore, we built a container using the
                      Singularity that includes the pre-processing of the
                      visualization process for an arbitrary dataset. Preliminary
                      results on the JUWELS system in the Jülich supercomputing
                      center (JSC) have shown a satisfactory scaling of the
                      application across multiple nodes. Work has begun to
                      automate the full visualization process, including the
                      ParaView application. For the second use-case, we have
                      partially containerized the machine learning workflow in the
                      context of weather and climate applications. In this proof
                      of concept, we are adopting a deep learning architecture for
                      video frame prediction to forecast the surface temperature
                      fields over Europe for up to 20 hours based on ERA5
                      reanalysis data. Since this workflow requires immense data
                      processing and the evaluation of various deep learning
                      architectures, we have developed a containerized workflow
                      for the full lifecycle of the application, which can run in
                      parallel on several nodes. This containerized application
                      uses Docker and Sarus and entails data extraction, data
                      pre-processing, training, post-processing, and
                      visualisation. The preliminary results of the containerized
                      application on up to 8 nodes of the Piz Daint HPC system in
                      the Swiss National Supercomputing center show a satisfactory
                      level of scalability. In the next phase of this study, we
                      will adopt the application to Singularity and will run it on
                      the JUWELS system in JSC.},
      month         = {Feb},
      date          = {2020-02-27},
      organization  = {NIC Symposium 2020, Jülich (Germany),
                       27 Feb 2020 - 28 Feb 2020},
      cin          = {JSC},
      cid          = {I:(DE-Juel1)JSC-20090406},
      pnm          = {512 - Data-Intensive Science and Federated Computing
                      (POF3-512) / IntelliAQ - Artificial Intelligence for Air
                      Quality (787576) / Earth System Data Exploration (ESDE)},
      pid          = {G:(DE-HGF)POF3-512 / G:(EU-Grant)787576 /
                      G:(DE-Juel-1)ESDE},
      typ          = {PUB:(DE-HGF)1},
      url          = {https://juser.fz-juelich.de/record/890154},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help