TypAmountVATCurrencyShareStatusCost centre
Hybrid-OA0.000.00EUR (DEAL)ZB
Sum0.000.00EUR   
Total0.00     
Journal Article FZJ-2026-02344

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
Impact of leakage on data harmonization in machine learning pipelines in class imbalance across sites

 ;  ;  ;  ;  ;  ;  ;  ;

2026
Elsevier Amsterdam

Neurocomputing 680, 133146 - () [10.1016/j.neucom.2026.133146]

This record in other databases:  

Please use a persistent id in citations: doi:  doi:

Abstract: Due to the cost and complexity of data collection in biomedical domains, it is a common practice to combine data from multiple sites to obtain large datasets required for machine learning. However, undesired site-specific variability presents challenges. Data harmonization aims to address this issue by removing site-specific variance while preserving biologically relevant information. We show that the widely used ComBat-based harmonization improvements are driven by data leakage due to illicit use of target information when class labels are imbalanced across sites, a common scenario in biomedical domains. We propose a novel approach, PrettYharmonize, which leverages subtle differences in data harmonized using different pretended target values. Using controlled benchmark datasets and real-world magnetic resonance imaging and clinical ICU data, we demonstrate that our leakage-free PrettYharmonize method achieves performance comparable to leakage-prone methods. As such, it is a viable method to integrate ComBat-based methods into machine learning applications.

Classification:

Contributing Institute(s):
  1. Gehirn & Verhalten (INM-7)
Research Program(s):
  1. 5254 - Neuroscientific Data Analytics and AI (POF4-525) (POF4-525)

Appears in the scientific report 2026
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; OpenAccess ; Clarivate Analytics Master Journal List ; Current Contents - Engineering, Computing and Technology ; Ebsco Academic Search ; Essential Science Indicators ; IF >= 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Dokumenttypen > Aufsätze > Zeitschriftenaufsätze
Institutssammlungen > INM > INM-7
Workflowsammlungen > Öffentliche Einträge
Workflowsammlungen > Publikationsgebühren
Publikationsdatenbank
Open Access

 Datensatz erzeugt am 2026-04-27, letzte Änderung am 2026-06-18


OpenAccess:
Volltext herunterladen PDF
Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)