Explainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset

Stadtler, Scarlet; Roscher, Ribana; Betancourt, Clara
doi:10.3390/make4010008
000906258 001__ 906258
000906258 005__ 20230127125340.0
000906258 0247_ $$2doi$$a10.3390/make4010008
000906258 0247_ $$2Handle$$a2128/30694
000906258 0247_ $$2altmetric$$aaltmetric:123030544
000906258 0247_ $$2WOS$$aWOS:000774979600001
000906258 037__ $$aFZJ-2022-01329
000906258 082__ $$a004
000906258 1001_ $$0P:(DE-Juel1)180752$$aStadtler, Scarlet$$b0$$eCorresponding author$$ufzj
000906258 245__ $$aExplainable Machine Learning Reveals Capabilities, Redundancy, and Limitations of a Geospatial Air Quality Benchmark Dataset
000906258 260__ $$aBasel$$bMDPI$$c2022
000906258 3367_ $$2DRIVER$$aarticle
000906258 3367_ $$2DataCite$$aOutput Types/Journal article
000906258 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1672835814_10869
000906258 3367_ $$2BibTeX$$aARTICLE
000906258 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000906258 3367_ $$00$$2EndNote$$aJournal Article
000906258 520__ $$aAir quality is relevant to society because it poses environmental risks to humans and nature. We use explainable machine learning in air quality research by analyzing model predictions in relation to the underlying training data. The data originate from worldwide ozone observations, paired with geospatial data. We use two different architectures: a neural network and a random forest trained on various geospatial data to predict multi-year averages of the air pollutant ozone. To understand how both models function, we explain how they represent the training data and derive their predictions. By focusing on inaccurate predictions and explaining why these predictions fail, we can (i) identify underrepresented samples, (ii) flag unexpected inaccurate predictions, and (iii) point to training samples irrelevant for predictions on the test set. Based on the underrepresented samples, we suggest where to build new measurement stations. We also show which training samples do not substantially contribute to the model performance. This study demonstrates the application of explainable machine learning beyond simply explaining the trained model.
000906258 536__ $$0G:(DE-HGF)POF4-5111$$a5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
000906258 536__ $$0G:(EU-Grant)787576$$aIntelliAQ - Artificial Intelligence for Air Quality (787576)$$c787576$$fERC-2017-ADG$$x1
000906258 536__ $$0G:(DE-Juel1)kiste_20200501$$aAI Strategy for Earth system data (kiste_20200501)$$ckiste_20200501$$fAI Strategy for Earth system data$$x2
000906258 536__ $$0G:(DE-Juel-1)ESDE$$aEarth System Data Exploration (ESDE)$$cESDE$$x3
000906258 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de
000906258 7001_ $$0P:(DE-Juel1)171435$$aBetancourt, Clara$$b1
000906258 7001_ $$0P:(DE-Juel1)186079$$aRoscher, Ribana$$b2
000906258 773__ $$0PERI:(DE-600)2934680-0$$a10.3390/make4010008$$gVol. 4, no. 1, p. 150 - 171$$n1$$p150 - 171$$tMachine learning and knowledge extraction$$v4$$x2504-4990$$y2022
000906258 8564_ $$uhttps://juser.fz-juelich.de/record/906258/files/stadtler_make.pdf$$yOpenAccess
000906258 909CO $$ooai:juser.fz-juelich.de:906258$$pdnbdelivery$$pec_fundedresources$$pVDB$$pdriver$$popen_access$$popenaire
000906258 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)180752$$aForschungszentrum Jülich$$b0$$kFZJ
000906258 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)171435$$aForschungszentrum Jülich$$b1$$kFZJ
000906258 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)186079$$aForschungszentrum Jülich$$b2$$kFZJ
000906258 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5111$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
000906258 9141_ $$y2022
000906258 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
000906258 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000906258 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2020-09-03
000906258 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2020-09-03
000906258 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2022-08-22T17:05:49Z
000906258 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2022-08-22T17:05:49Z
000906258 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bDOAJ : Blind peer review$$d2022-08-22T17:05:49Z
000906258 915__ $$0LIC:(DE-HGF)CCBYNV$$2V:(DE-HGF)$$aCreative Commons Attribution CC BY (No Version)$$bDOAJ$$d2022-08-22T17:05:49Z
000906258 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2022-11-17
000906258 915__ $$0StatID:(DE-HGF)0112$$2StatID$$aWoS$$bEmerging Sources Citation Index$$d2022-11-17
000906258 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2022-11-17
000906258 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2022-11-17
000906258 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2022-11-17
000906258 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
000906258 980__ $$ajournal
000906258 980__ $$aVDB
000906258 980__ $$aI:(DE-Juel1)JSC-20090406
000906258 980__ $$aUNRESTRICTED
000906258 9801_ $$aFullTexts
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe