Stacking ensemble for age-prediction improves performance and privacy

Antonopoulos, Georgios; Raimondo, Federico; More, Shammi; Patil, Kaustubh

Poster (After Call)

FZJ-2024-01166

Stacking ensemble for age-prediction improves performance and privacy

Antonopoulos, G. (Corresponding author)FZJ* ; More, S.FZJ* ; Raimondo, F.FZJ* ; Patil, K.FZJ*

2023

Helmholtz AI, Hamburg, Germany, 12 Jun 2023 - 14 Jun 2023 [10.34734/FZJ-2024-01166]

This record in other databases:

Please use a persistent id in citations: doi:10.34734/FZJ-2024-01166

Abstract: Brain-age prediction (BAP) using structural MRI has shown great potential for studying healthy aging anddisease. Two major desirable properties for BAP are high accuracy and data privacy. We propose astacking ensemble model (SEM) which improves both compared to current implementations.Our SEM consists of two levels (L0 and L1). At L0, we used an 873-parcel atlas to group gray-mattervolume voxels, and trained one GLMnet model for each parcel. The out-of-sample (OOS, using 3-fold cross-validation) predictions from all L0 models were used as features to train a GLMnet model at L1 whichprovides the final age-prediction. To make predictions on an independent test-set, L0 models were trainedon the whole dataset (Figure 1).We explored two different ways to train models at L0 and L1, i.e., i) using pooled data from different sites,and ii) treating each site separately and then averaging their outcomes. To compare with currentstandards we also tested models using average GMV in each parcel as inputs of L1. Additionally, to test thecase where enough data is available at the test site, we estimated L0-level OOS predictions on the testdata. These were then used to obtain predictions using L1 models. The former schemes provide differentlevels and types of privacy advantage. The latter provides an advantage for clinical applications, as onlyL0-level predictions need to be shared and not the raw data.We used T1w MRI scans of healthy subjects from 4 open datasets (IXI, eNKI, CamCAN and 1000Gehirne)with n>500 each (total N=3103, 18-90 age range). We performed leave-one-site-out analysis and testedthe impact of using one or more datasets for training.The highest test performance was observed for the set-ups with L0-level predictions coming from the testdata, with the best set up using pooled predictions of L0 from three sites to train the L1 model (MAE=4.7)followed by the L1 models trained on 3 sites separately (MAE=4.8). This set-up provides improved dataprivacy as L0 analysis can be performed at the application site and only predictions need to be shared.Set-ups based on mean GMV performed the worst (MAE=6.5-7.3). We also found that L0 models providerobust interpretation of regional aging effects, i.e. the Pearson correlation of real age with predicted-agewas higher than with GMV.

Contributing Institute(s):

Gehirn & Verhalten (INM-7)

Research Program(s):

Appears in the scientific report 2023

Database coverage:
OpenAccess

Click to display QR Code for this record

The record appears in these collections:
Document types > Presentations > Poster
Institute Collections > INM > INM-7
Workflow collections > Public records
Publications database
Open Access

Record created 2024-01-30, last modified 2024-02-26

Similar records

OpenAccess:

PDF

Rate this document:

(Not yet reviewed)

Add to personal basket
Export as Author List with IDs BibTeX (UTF-8), EndNote XML, EndNote Text, RIS, MARC, Print MARC, MARCXML, DC,
Request correction
Submit fulltext

guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help