Conference Presentation (Other) FZJ-2020-01965

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
A Statistical Model for Automated Quality Assessment of the TOAR-II

 ;  ;  ;

2020

EGU2020: Sharing Geoscience Online, #shareEGU20, ViennaVienna, Austria, 4 May 2020 - 8 May 20202020-05-042020-05-08 [10.5194/egusphere-egu2020-13357]

This record in other databases:

Please use a persistent id in citations:   doi:

Abstract: The Tropospheric Ozone Assessment Report, phase 2, (TOAR-II) database is a collection of global ground-level ozone in-situ measurements from various locations. It also holds data of selected ozone precursors and meteorological variables. TOAR-II assembles air quality data from many different sources and thus requires a common data quality assessment (QA) to ensure the data meet the quality required for globally consistent analyses. The large volume of this database (more than 100,000 data series) enforces the use of automated, data-driven QA procedures. Accordingly, we have developed a statistical model for automated QA. This model consists of several statistical tests that are classified into several sub-groups. In this model, a QA-score (an indicator ranging from 0 to 1) was assigned to each individual data point to estimates the value‘s plausibility. The foundation of this concept is statistical hypothesis testing and the probability theory. This model was implemented in a Python package and is called AutoQA4Env. One application of AutoQA4Env is the data ingestion workflow of TOAR-II. The tool generates a data quality report which is then sent back to the data provider for inspection. Since AutoQA4Env is easily configurable, it allows the users to set quality thresholds and thus filter data according to their use case. While we primarily develop AutoQA4Env for air quality data, the same concept and model might be applicable to other databases and the software framework is flexible enough to allow for other use cases.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 512 - Data-Intensive Science and Federated Computing (POF3-512) (POF3-512)
  2. IntelliAQ - Artificial Intelligence for Air Quality (787576) (787576)
  3. Earth System Data Exploration (ESDE) (ESDE)

Appears in the scientific report 2020
Database coverage:
Creative Commons Attribution CC BY 4.0 ; OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Document types > Presentations > Conference Presentations
Workflow collections > Public records
Institute Collections > JSC
JuOSC (Juelich Open Science Collection)
Publications database
Open Access

 Record created 2020-05-12, last modified 2023-07-11


OpenAccess:
Abstract - Download fulltext PDF Download fulltext PDF (PDFA)
Presentation_Slides - Download fulltext PDF Download fulltext PDF (PDFA)
External link:
Download fulltextFulltext by OpenAccess repository
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)