Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Stadtler, Scarlet; Roscher, Ribana; Betancourt, Clara

doi:10.3390/make4010008

Items
Marc 21

001			906258
005			20230127125340.0
024	7	_	\|a 10.3390/make4010008 \|2 doi
024	7	_	\|a 2128/30694 \|2 Handle
024	7	_	\|a altmetric:123030544 \|2 altmetric
024	7	_	\|a WOS:000774979600001 \|2 WOS
037	_	_	\|a FZJ-2022-01329
082	_	_	\|a 004
100	1	_	\|a Stadtler, Scarlet \|0 P:(DE-Juel1)180752 \|b 0 \|e Corresponding author \|u fzj
245	_	_	\|a Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset
260	_	_	\|a Basel \|c 2022 \|b MDPI
336	7	_	\|a article \|2 DRIVER
336	7	_	\|a Output Types/Journal article \|2 DataCite
336	7	_	\|a Journal Article \|b journal \|m journal \|0 PUB:(DE-HGF)16 \|s 1672835814_10869 \|2 PUB:(DE-HGF)
336	7	_	\|a ARTICLE \|2 BibTeX
336	7	_	\|a JOURNAL_ARTICLE \|2 ORCID
336	7	_	\|a Journal Article \|0 0 \|2 EndNote
520	_	_	\|a Air quality is relevant to society because it poses environmental risks to humans and nature. We use explainable machine learning in air quality research by analyzing model predictions in relation to the underlying training data. The data originate from worldwide ozone observations, paired with geospatial data. We use two different architectures: a neural network and a random forest trained on various geospatial data to predict multi-year averages of the air pollutant ozone. To understand how both models function, we explain how they represent the training data and derive their predictions. By focusing on inaccurate predictions and explaining why these predictions fail, we can (i) identify underrepresented samples, (ii) flag unexpected inaccurate predictions, and (iii) point to training samples irrelevant for predictions on the test set. Based on the underrepresented samples, we suggest where to build new measurement stations. We also show which training samples do not substantially contribute to the model performance. This study demonstrates the application of explainable machine learning beyond simply explaining the trained model.
536	_	_	\|a 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) \|0 G:(DE-HGF)POF4-5111 \|c POF4-511 \|f POF IV \|x 0
536	_	_	\|a IntelliAQ - Artificial Intelligence for Air Quality (787576) \|0 G:(EU-Grant)787576 \|c 787576 \|f ERC-2017-ADG \|x 1
536	_	_	\|a AI Strategy for Earth system data (kiste_20200501) \|0 G:(DE-Juel1)kiste_20200501 \|c kiste_20200501 \|f AI Strategy for Earth system data \|x 2
536	_	_	\|0 G:(DE-Juel-1)ESDE \|a Earth System Data Exploration (ESDE) \|c ESDE \|x 3
588	_	_	\|a Dataset connected to CrossRef, Journals: juser.fz-juelich.de
700	1	_	\|a Betancourt, Clara \|0 P:(DE-Juel1)171435 \|b 1
700	1	_	\|a Roscher, Ribana \|0 P:(DE-Juel1)186079 \|b 2
773	_	_	\|a 10.3390/make4010008 \|g Vol. 4, no. 1, p. 150 - 171 \|0 PERI:(DE-600)2934680-0 \|n 1 \|p 150 - 171 \|t Machine learning and knowledge extraction \|v 4 \|y 2022 \|x 2504-4990
856	4	_	\|u https://juser.fz-juelich.de/record/906258/files/stadtler_make.pdf \|y OpenAccess
909	C	O	\|o oai:juser.fz-juelich.de:906258 \|p openaire \|p open_access \|p driver \|p VDB \|p ec_fundedresources \|p dnbdelivery
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 0 \|6 P:(DE-Juel1)180752
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 1 \|6 P:(DE-Juel1)171435
910	1	_	\|a Forschungszentrum Jülich \|0 I:(DE-588b)5008462-8 \|k FZJ \|b 2 \|6 P:(DE-Juel1)186079
913	1	_	\|a DE-HGF \|b Key Technologies \|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action \|1 G:(DE-HGF)POF4-510 \|0 G:(DE-HGF)POF4-511 \|3 G:(DE-HGF)POF4 \|2 G:(DE-HGF)POF4-500 \|4 G:(DE-HGF)POF \|v Enabling Computational- & Data-Intensive Science and Engineering \|9 G:(DE-HGF)POF4-5111 \|x 0
914	1	_	\|y 2022
915	_	_	\|a Creative Commons Attribution CC BY 4.0 \|0 LIC:(DE-HGF)CCBY4 \|2 HGFVOC
915	_	_	\|a OpenAccess \|0 StatID:(DE-HGF)0510 \|2 StatID
915	_	_	\|a Article Processing Charges \|0 StatID:(DE-HGF)0561 \|2 StatID \|d 2020-09-03
915	_	_	\|a Fees \|0 StatID:(DE-HGF)0700 \|2 StatID \|d 2020-09-03
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0501 \|2 StatID \|b DOAJ Seal \|d 2022-08-22T17:05:49Z
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0500 \|2 StatID \|b DOAJ \|d 2022-08-22T17:05:49Z
915	_	_	\|a Peer Review \|0 StatID:(DE-HGF)0030 \|2 StatID \|b DOAJ : Blind peer review \|d 2022-08-22T17:05:49Z
915	_	_	\|a Creative Commons Attribution CC BY (No Version) \|0 LIC:(DE-HGF)CCBYNV \|2 V:(DE-HGF) \|b DOAJ \|d 2022-08-22T17:05:49Z
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0199 \|2 StatID \|b Clarivate Analytics Master Journal List \|d 2022-11-17
915	_	_	\|a WoS \|0 StatID:(DE-HGF)0112 \|2 StatID \|b Emerging Sources Citation Index \|d 2022-11-17
915	_	_	\|a DBCoverage \|0 StatID:(DE-HGF)0150 \|2 StatID \|b Web of Science Core Collection \|d 2022-11-17
915	_	_	\|a Article Processing Charges \|0 StatID:(DE-HGF)0561 \|2 StatID \|d 2022-11-17
915	_	_	\|a Fees \|0 StatID:(DE-HGF)0700 \|2 StatID \|d 2022-11-17
920	1	_	\|0 I:(DE-Juel1)JSC-20090406 \|k JSC \|l Jülich Supercomputing Center \|x 0
980	_	_	\|a journal
980	_	_	\|a VDB
980	_	_	\|a I:(DE-Juel1)JSC-20090406
980	_	_	\|a UNRESTRICTED
980	1	_	\|a FullTexts

Library	Collection	CLSMajor	CLSMinor	Language	Author

Marc 21

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help