TY - CONF
AU - Tian, Liang
AU - Sedona, Rocco
AU - Mozaffari, Amirpasha
AU - Kreshpa, Enxhi
AU - Paris, Claudia
AU - Riedel, Morris
AU - Schultz, Martin G.
AU - Cavallaro, Gabriele
TI - End-to-End Process Orchestration of Earth Observation Data Workflows with Apache Airflow on High Performance Computing
PB - IEEE
M1 - FZJ-2023-04455
SP - 711-714
PY - 2023
AB - Earth Observation (EO) data processing faces challenges due to large volumes, multiple sources, and diverse formats. To address this issue, this paper presents a scalable and parallelizable workflow using Apache Airflow, capable of integrating Machine Learning (ML) and Deep Learning (DL) models with Modular Supercomputing Architecture (MSA) systems. To test the workflow, we considered the production of large-scale Land-Cover (LC) maps as a case study. The workflow manager, Airflow, offers scalability, extensibility, and programmable task definition in Python. It allows us to execute different steps of the workflow in different High-Performance Computing (HPC) systems. The workflow is demonstrated on the Dynamical Exascale Entry Platform (DEEP) and Jülich Research on Exascale Cluster Architectures (JURECA) hosted at the Jülich Supercomputing Centre (JSC), a platform that incorporates heterogeneous JSC systems.
T2 - IEEE International Geoscience and Remote Sensing Symposium (IGARSS)
CY - 16 Jul 2023 - 21 Jul 2023, Pasadena (CA)
Y2 - 16 Jul 2023 - 21 Jul 2023
M2 - Pasadena, CA
LB - PUB:(DE-HGF)8 ; PUB:(DE-HGF)7
UR - <Go to ISI:>//WOS:001098971601004
DO - DOI:10.1109/IGARSS52108.2023.10283416
UR - https://juser.fz-juelich.de/record/1017950
ER -