A Novel Concept for Automated Quality Control of Atmospheric Time Series

Kaffashzadeh, Najmeh; Schröder, Sabine; Schultz, Martin

Poster (After Call)

FZJ-2019-03957

A Novel Concept for Automated Quality Control of Atmospheric Time Series

Kaffashzadeh, N. (Corresponding author)FZJ* ; Schröder, S.FZJ* ; Schultz, M.FZJ*

2019

European Geoscience Union (EGU), Vienna, Austria, 7 Apr 2019 - 12 Apr 2019

Please use a persistent id in citations: http://hdl.handle.net/2128/22561

Abstract: Measurements of atmospheric physical and chemical parameters are essential for atmospheric model evaluation,trend analysis, climate prediction, and other applications. Particularly when the time series from various measure-ment instruments or data providers are merged together, assessing the quality of the data presents a major challengeand often relies on subjective screening. The quality of the time series can be affected by several error types, suchas random error, systematic error due to calibration errors, and gross error from malfunctioning instruments, ordata processing errors, such as mistyped values and improper date-time formats. Some of these errors may havea considerable impact on the statistical analysis of the time series. Thus, identifying the quality of the data, i.e.quality control (QC), is an essential step for any data analysis.Here, we present a software package for the automated QC of the atmospheric time series based on the use ofseveral algorithms that are in use at various environmental agencies and research initiatives. The tool can either beembedded in automated workflows to process real-time data or be applied to a second-level analysis of archivedmulti-year data. Several statistical tests are grouped in categories with increasing complexity. Any number of testscan be defined and run sequentially. The set of statistical tests and any user arguments can easily be configuredwith variable-specific control files in the JSON format. This allows for easy integration into an automated work-flow software and distributed data processing services.For expressing the quality of a measured data series, we introduced a probability concept which assigns each valuea likelihood of being "good" data. Here, "good" is interpreted in a statistical sense as belonging to an expectedprobability distribution. Some of the tests influence not only the probability of a single point but may also impacton the probability of its neighboring points.We tested the software with multi-annual hourly ozone and temperature data from the database of the TroposphericOzone Assessment Report (TOAR). Preliminary results indicate that the concept works well and is able to dealwith a large and heterogeneous dataset such as the global collection of ozone data in the TOAR database.

Contributing Institute(s):