Data leakage in machine learning: A conceptual take

Komeyer, Vera; Li, Jingwei; Reuter, Martin; Wolfers, Thomas; Patil, Kaustubh
001034781 001__ 1034781
001034781 005__ 20250203103400.0
001034781 037__ $$aFZJ-2024-07535
001034781 1001_ $$0P:(DE-Juel1)187351$$aKomeyer, Vera$$b0$$eCorresponding author
001034781 1112_ $$aDGKN$$cFrankfurt am Main$$d2024-03-06 - 2024-03-09$$wGermany
001034781 245__ $$aData leakage in machine learning: A conceptual take
001034781 260__ $$c2024
001034781 3367_ $$033$$2EndNote$$aConference Paper
001034781 3367_ $$2DataCite$$aOther
001034781 3367_ $$2BibTeX$$aINPROCEEDINGS
001034781 3367_ $$2DRIVER$$aconferenceObject
001034781 3367_ $$2ORCID$$aLECTURE_SPEECH
001034781 3367_ $$0PUB:(DE-HGF)6$$2PUB:(DE-HGF)$$aConference Presentation$$bconf$$mconf$$s1736238240_13392$$xInvited
001034781 520__ $$aSymposium:Machine learning (ML) and artificial intelligence (AI) are increasingly being applied to study how individual differences in the brain can manifest as distinct psychiatric illnesses. These models can help us establish the neural correlates of mental distress and predict individual-level diagnosis, symptoms, trajectories, and treatment responses. To realize the full potential of these models it is important to recognize their data requirements as well as biases in data and modeling choices that can limit applicability and insights provided by the models. Biases in these models and data can lead to inaccurate and unfair predictions and overlook individual variations, which can have serious consequences for patients and they can perpetuate and amplify existing health disparities and inequalities. These biases may arise from methodological choices including the neuroimaging modality and state, behavioral phenotypes, data transformation, sample size, population, and modeling pipelines. It is crucial to carefully evaluate the risks associated with AI/ML-based modeling such as biases and develop strategies to identify and mitigate them. In doing so we can improve the accuracy, fairness, and reliability of the predictions and ensure that they benefit all patients equally. This symposium will discuss opportunities and challenges related to application of AI/ML in neuroimaging data from both applied and conceptual perspectives.Data Leakage Talk:ML's popularity stems from the promise of sample-level prediction using high dimensional data. However, if not properly implemented and evaluated, data-leakage in ML pipelines may result in overoptimistic performance estimates and fail to generalize to new data. In this talk I will discuss data-leakage associated challenges and remedies.
001034781 536__ $$0G:(DE-HGF)POF4-5254$$a5254 - Neuroscientific Data Analytics and AI (POF4-525)$$cPOF4-525$$fPOF IV$$x0
001034781 536__ $$0G:(GEPRIS)431549029$$aDFG project G:(GEPRIS)431549029 - SFB 1451: Schlüsselmechanismen normaler und krankheitsbedingt gestörter motorischer Kontrolle (431549029)$$c431549029$$x1
001034781 7001_ $$0P:(DE-Juel1)172843$$aPatil, Kaustubh$$b1$$eCorresponding author
001034781 7001_ $$0P:(DE-HGF)0$$aReuter, Martin$$b2
001034781 7001_ $$0P:(DE-HGF)0$$aWolfers, Thomas$$b3
001034781 7001_ $$0P:(DE-Juel1)164828$$aLi, Jingwei$$b4
001034781 909CO $$ooai:juser.fz-juelich.de:1034781$$pVDB
001034781 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)187351$$aForschungszentrum Jülich$$b0$$kFZJ
001034781 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)172843$$aForschungszentrum Jülich$$b1$$kFZJ
001034781 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)164828$$aForschungszentrum Jülich$$b4$$kFZJ
001034781 9131_ $$0G:(DE-HGF)POF4-525$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5254$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vDecoding Brain Organization and Dysfunction$$x0
001034781 9141_ $$y2024
001034781 920__ $$lyes
001034781 9201_ $$0I:(DE-Juel1)INM-7-20090406$$kINM-7$$lGehirn & Verhalten$$x0
001034781 980__ $$aconf
001034781 980__ $$aVDB
001034781 980__ $$aI:(DE-Juel1)INM-7-20090406
001034781 980__ $$aUNRESTRICTED
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help