000875342 001__ 875342
000875342 005__ 20230711152627.0
000875342 0247_ $$2doi$$a10.5194/egusphere-egu2020-13357
000875342 0247_ $$2Handle$$a2128/24954
000875342 037__ $$aFZJ-2020-01965
000875342 041__ $$aEnglish
000875342 1001_ $$0P:(DE-Juel1)165903$$aKaffashzadeh, Najmeh$$b0$$eCorresponding author
000875342 1112_ $$aEGU2020: Sharing Geoscience Online$$cVienna$$d2020-05-04 - 2020-05-08$$g#shareEGU20$$wAustria
000875342 245__ $$aA Statistical Model for Automated Quality Assessment of the TOAR-II
000875342 260__ $$c2020
000875342 3367_ $$033$$2EndNote$$aConference Paper
000875342 3367_ $$2DataCite$$aOther
000875342 3367_ $$2BibTeX$$aINPROCEEDINGS
000875342 3367_ $$2DRIVER$$aconferenceObject
000875342 3367_ $$2ORCID$$aLECTURE_SPEECH
000875342 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1591168427_5616$$xOther
000875342 520__ $$aThe Tropospheric Ozone Assessment Report, phase 2, (TOAR-II) database is a collection of global ground-level ozone in-situ measurements from various locations. It also holds data of selected ozone precursors and meteorological variables. TOAR-II assembles air quality data from many different sources and thus requires a common data quality assessment (QA) to ensure the data meet the quality required for globally consistent analyses. The large volume of this database (more than 100,000 data series) enforces the use of automated, data-driven QA procedures. Accordingly, we have developed a statistical model for automated QA. This model consists of several statistical tests that are classified into several sub-groups. In this model, a QA-score (an indicator ranging from 0 to 1) was assigned to each individual data point to estimates the value‘s plausibility. The foundation of this concept is statistical hypothesis testing and the probability theory. This model was implemented in a Python package and is called AutoQA4Env. One application of AutoQA4Env is the data ingestion workflow of TOAR-II. The tool generates a data quality report which is then sent back to the data provider for inspection. Since AutoQA4Env is easily configurable, it allows the users to set quality thresholds and thus filter data according to their use case. While we primarily develop AutoQA4Env for air quality data, the same concept and model might be applicable to other databases and the software framework is flexible enough to allow for other use cases.
000875342 536__ $$0G:(DE-HGF)POF3-512$$a512 - Data-Intensive Science and Federated Computing (POF3-512)$$cPOF3-512$$fPOF III$$x0
000875342 536__ $$0G:(EU-Grant)787576$$aIntelliAQ - Artificial Intelligence for Air Quality (787576)$$c787576$$fERC-2017-ADG$$x1
000875342 536__ $$0G:(DE-Juel-1)ESDE$$aEarth System Data Exploration (ESDE)$$cESDE$$x2
000875342 588__ $$aDataset connected to CrossRef
000875342 7001_ $$00000-0001-5812-3183$$aChang, Kai-Lan$$b1
000875342 7001_ $$0P:(DE-Juel1)16212$$aSchröder, Sabine$$b2
000875342 7001_ $$0P:(DE-Juel1)6952$$aSchultz, Martin G.$$b3
000875342 773__ $$a10.5194/egusphere-egu2020-13357
000875342 8564_ $$uhttps://juser.fz-juelich.de/record/875342/files/Abstract.pdf$$yOpenAccess
000875342 8564_ $$uhttps://juser.fz-juelich.de/record/875342/files/Presentation_Slides.pdf$$yOpenAccess
000875342 8564_ $$uhttps://juser.fz-juelich.de/record/875342/files/Abstract.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000875342 8564_ $$uhttps://juser.fz-juelich.de/record/875342/files/Presentation_Slides.pdf?subformat=pdfa$$xpdfa$$yOpenAccess
000875342 909CO $$ooai:juser.fz-juelich.de:875342$$pec_fundedresources$$pdriver$$pVDB$$popen_access$$popenaire
000875342 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000875342 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
000875342 9141_ $$y2020
000875342 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)165903$$aForschungszentrum Jülich$$b0$$kFZJ
000875342 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)16212$$aForschungszentrum Jülich$$b2$$kFZJ
000875342 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)6952$$aForschungszentrum Jülich$$b3$$kFZJ
000875342 9131_ $$0G:(DE-HGF)POF3-512$$1G:(DE-HGF)POF3-510$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lSupercomputing & Big Data$$vData-Intensive Science and Federated Computing$$x0
000875342 920__ $$lyes
000875342 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000875342 980__ $$aconf
000875342 980__ $$aVDB
000875342 980__ $$aUNRESTRICTED
000875342 980__ $$aI:(DE-Juel1)JSC-20090406
000875342 980__ $$aOPENSCIENCE
000875342 9801_ $$aFullTexts