Confound-leakage: Confound Removal in Machine Learning Leads to Leakage

Hamdan, Sami; Schwender, Holger; Eickhoff, Simon; Polier, Georg von; Patil, Kaustubh; Weis, Susanne; Love, Bradley C

doi:10.1093/gigascience/giad071

Typ	Amount	VAT	Currency	Share	Status	Cost centre
APC	2240.95	0.00	EUR	100.00 %	(Zahlung erfolgt)	ZB
Sum	2240.95	0.00	EUR
Total	2240.95

Journal Article

FZJ-2023-03119

Confound-leakage: Confound Removal in Machine Learning Leads to Leakage

Hamdan, S.FZJ* ; Love, B. C. ; Polier, G. v. ; Weis, S.FZJ* ; Schwender, H. ; Eickhoff, S.FZJ* ; Patil, K. (Corresponding author)FZJ*

2023
Oxford University Press Oxford

GigaScience 12, giad071 (20323) [10.1093/gigascience/giad071]

This record in other databases:

Please use a persistent id in citations: doi:10.1093/gigascience/giad071 doi:10.34734/FZJ-2023-03119

Abstract: BackgroundMachine learning (ML) approaches are a crucial component of modern data analysis in many fields, including epidemiology and medicine. Nonlinear ML methods often achieve accurate predictions, for instance, in personalized medicine, as they are capable of modeling complex relationships between features and the target. Problematically, ML models and their predictions can be biased by confounding information present in the features. To remove this spurious signal, researchers often employ featurewise linear confound regression (CR). While this is considered a standard approach for dealing with confounding, possible pitfalls of using CR in ML pipelines are not fully understood.ResultsWe provide new evidence that, contrary to general expectations, linear confound regression can increase the risk of confounding when combined with nonlinear ML approaches. Using a simple framework that uses the target as a confound, we show that information leaked via CR can increase null or moderate effects to near-perfect prediction. By shuffling the features, we provide evidence that this increase is indeed due to confound-leakage and not due to revealing of information. We then demonstrate the danger of confound-leakage in a real-world clinical application where the accuracy of predicting attention-deficit/hyperactivity disorder is overestimated using speech-derived features when using depression as a confound.ConclusionsMishandling or even amplifying confounding effects when building ML models due to confound-leakage, as shown, can lead to untrustworthy, biased, and unfair predictions. Our expose of the confound-leakage pitfall and provided guidelines for dealing with it can help create more robust and trustworthy ML models.

Classification:

ddc:610

Note: This work was partly supported by the Helmholtz-AI project DeGen (ZT-I-PF-5-078), the Helmholtz Portfolio Theme “Supercomputing and Modeling for the Human Brain,” and Deutsche Forschungsgemeinschaft (DFG, German Research Foundation), project-ID 431549029–SFB 1451 project B05.

Contributing Institute(s):

Gehirn & Verhalten (INM-7)

Research Program(s):

Appears in the scientific report 2023

Database coverage:
Medline

;

;

;

; Article Processing Charges ; BIOSIS Previews ; Biological Abstracts ; Clarivate Analytics Master Journal List ; Current Contents - Agriculture, Biology and Environmental Sciences ; Current Contents - Life Sciences ; DOAJ Seal ; Ebsco Academic Search ; Essential Science Indicators ; Fees ; IF >= 5 ; JCR ; PubMed Central ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection ; Zoological Record

Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Institute Collections > INM > INM-7
Workflow collections > Public records
Workflow collections > Publication Charges
Publications database
Open Access

Record created 2023-08-21, last modified 2024-04-29

Similar records

OpenAccess:

PDF
(additional files)

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help