000890154 001__ 890154
000890154 005__ 20230127125338.0
000890154 037__ $$aFZJ-2021-00743
000890154 041__ $$aEnglish
000890154 1001_ $$0P:(DE-Juel1)177767$$aGong, Bing$$b0$$eCorresponding author
000890154 1112_ $$aNIC Symposium 2020$$cJülich$$d2020-02-27 - 2020-02-28$$wGermany
000890154 245__ $$aOn the use of containers for machine learning and visualization workflows on JUWELS
000890154 260__ $$c2020
000890154 3367_ $$0PUB:(DE-HGF)1$$2PUB:(DE-HGF)$$aAbstract$$babstract$$mabstract$$s1611580634_2888
000890154 3367_ $$033$$2EndNote$$aConference Paper
000890154 3367_ $$2BibTeX$$aINPROCEEDINGS
000890154 3367_ $$2DRIVER$$aconferenceObject
000890154 3367_ $$2DataCite$$aOutput Types/Conference Abstract
000890154 3367_ $$2ORCID$$aOTHER
000890154 520__ $$aContainers stock a single package of a code along with its dependencies so it can run reliably and efficiently in different computing environments. They promise the same level of isolation and security as a virtual machine and a higher degree of integration with the host operating system (OS). The main benefits of containers are, from a user perspective: greater software flexibility, reliability, ease of deployment, and portability. Containers have become very popular on cloud systems, but they have not been used much in HPC environments. In this study, we have tested the use of containers and measured the performance of the containerized workflow for two separate applications in the HPC system. In the first use case, we have automated the visualization process of global wildfire activity and the resulting “smoke” plumes from numerical model results of the Copernicus Atmosphere Monitoring System (https://www.ecmwf.int/en/about/what-we-do/environmental-services/copernicus-atmosphere-monitoring-service). The motivation for this workflow was to expedite the process of visualizing new fire situations without having to engage several people along the workflow from data extraction, data transformations, and the actual visualisation. Once, a container workflow is defined for this application, it can be easily adapted to work with other model variables, time periods, etc. Therefore, we built a container using the Singularity that includes the pre-processing of the visualization process for an arbitrary dataset. Preliminary results on the JUWELS system in the Jülich supercomputing center (JSC) have shown a satisfactory scaling of the application across multiple nodes. Work has begun to automate the full visualization process, including the ParaView application. For the second use-case, we have partially containerized the machine learning workflow in the context of weather and climate applications. In this proof of concept, we are adopting a deep learning architecture for video frame prediction to forecast the surface temperature fields over Europe for up to 20 hours based on ERA5 reanalysis data. Since this workflow requires immense data processing and the evaluation of various deep learning architectures, we have developed a containerized workflow for the full lifecycle of the application, which can run in parallel on several nodes. This containerized application uses Docker and Sarus and entails data extraction, data pre-processing, training, post-processing, and visualisation. The preliminary results of the containerized application on up to 8 nodes of the Piz Daint HPC system in the Swiss National Supercomputing center show a satisfactory level of scalability. In the next phase of this study, we will adopt the application to Singularity and will run it on the JUWELS system in JSC.
000890154 536__ $$0G:(DE-HGF)POF3-512$$a512 - Data-Intensive Science and Federated Computing (POF3-512)$$cPOF3-512$$fPOF III$$x0
000890154 536__ $$0G:(EU-Grant)787576$$aIntelliAQ - Artificial Intelligence for Air Quality (787576)$$c787576$$fERC-2017-ADG$$x1
000890154 536__ $$0G:(DE-Juel-1)ESDE$$aEarth System Data Exploration (ESDE)$$cESDE$$x2
000890154 7001_ $$0P:(DE-Juel1)173676$$aVogelsang, Jan$$b1
000890154 7001_ $$0P:(DE-Juel1)166264$$aMozaffari, Amirpasha$$b2
000890154 7001_ $$0P:(DE-Juel1)6952$$aSchultz, Martin$$b3
000890154 909CO $$ooai:juser.fz-juelich.de:890154$$pec_fundedresources$$pVDB$$popenaire
000890154 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)177767$$aForschungszentrum Jülich$$b0$$kFZJ
000890154 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)173676$$aForschungszentrum Jülich$$b1$$kFZJ
000890154 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)166264$$aForschungszentrum Jülich$$b2$$kFZJ
000890154 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)6952$$aForschungszentrum Jülich$$b3$$kFZJ
000890154 9131_ $$0G:(DE-HGF)POF3-512$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vData-Intensive Science and Federated Computing$$x0
000890154 9141_ $$y2020
000890154 920__ $$lyes
000890154 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000890154 980__ $$aabstract
000890154 980__ $$aVDB
000890154 980__ $$aI:(DE-Juel1)JSC-20090406
000890154 980__ $$aUNRESTRICTED