TY  - JOUR
AU  - Stadtler, Scarlet
AU  - Betancourt, Clara
AU  - Roscher, Ribana
TI  - Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset
JO  - Machine learning and knowledge extraction
VL  - 4
IS  - 1
SN  - 2504-4990
CY  - Basel
PB  - MDPI
M1  - FZJ-2022-01329
SP  - 150 - 171
PY  - 2022
AB  - Air quality is relevant to society because it poses environmental risks to humans and nature. We use explainable machine learning in air quality research by analyzing model predictions in relation to the underlying training data. The data originate from worldwide ozone observations, paired with geospatial data. We use two different architectures: a neural network and a random forest trained on various geospatial data to predict multi-year averages of the air pollutant ozone. To understand how both models function, we explain how they represent the training data and derive their predictions. By focusing on inaccurate predictions and explaining why these predictions fail, we can (i) identify underrepresented samples, (ii) flag unexpected inaccurate predictions, and (iii) point to training samples irrelevant for predictions on the test set. Based on the underrepresented samples, we suggest where to build new measurement stations. We also show which training samples do not substantially contribute to the model performance. This study demonstrates the application of explainable machine learning beyond simply explaining the trained model.
LB  - PUB:(DE-HGF)16
UR  - <Go to ISI:>//WOS:000774979600001
DO  - DOI:10.3390/make4010008
UR  - https://juser.fz-juelich.de/record/906258
ER  -