001043079 001__ 1043079 001043079 005__ 20250916202447.0 001043079 0247_ $$2doi$$a10.1186/s40537-025-01193-8 001043079 0247_ $$2datacite_doi$$a10.34734/FZJ-2025-02765 001043079 0247_ $$2WOS$$aWOS:001498691400001 001043079 037__ $$aFZJ-2025-02765 001043079 082__ $$a004 001043079 1001_ $$0P:(DE-Juel1)190306$$aSasse, L.$$b0 001043079 245__ $$aOverview of leakage scenarios in supervised machine learning 001043079 260__ $$aHeidelberg [u.a.]$$bSpringerOpen$$c2025 001043079 3367_ $$2DRIVER$$aarticle 001043079 3367_ $$2DataCite$$aOutput Types/Journal article 001043079 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1750757943_4685 001043079 3367_ $$2BibTeX$$aARTICLE 001043079 3367_ $$2ORCID$$aJOURNAL_ARTICLE 001043079 3367_ $$00$$2EndNote$$aJournal Article 001043079 520__ $$aMachine learning (ML) provides powerful tools for predictive modeling. ML’s popularity stems from the promise of sample-level prediction with applications across a variety of fields from physics and marketing to healthcare. However, if not properly implemented and evaluated, ML pipelines may contain leakage typically resulting in overoptimistic performance estimates and failure to generalize to new data. This can have severe negative financial and societal implications. Our aim is to expand understanding associated with causes leading to leakage when designing, implementing, and evaluating ML pipelines. Illustrated by concrete examples, we provide a comprehensive overview and discussion of various types of leakage that may arise in ML pipelines. 001043079 536__ $$0G:(DE-HGF)POF4-5254$$a5254 - Neuroscientific Data Analytics and AI (POF4-525)$$cPOF4-525$$fPOF IV$$x0 001043079 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de 001043079 7001_ $$0P:(DE-Juel1)180537$$aNicolaisen, Eliana$$b1$$ufzj 001043079 7001_ $$0P:(DE-Juel1)177727$$aDukart, Jürgen$$b2 001043079 7001_ $$0P:(DE-Juel1)131678$$aEickhoff, S. B.$$b3 001043079 7001_ $$0P:(DE-HGF)0$$aGötz, M.$$b4 001043079 7001_ $$0P:(DE-HGF)0$$aHamdan, S.$$b5 001043079 7001_ $$0P:(DE-Juel1)187351$$aKomeyer, V.$$b6 001043079 7001_ $$0P:(DE-HGF)0$$aKulkarni, A.$$b7 001043079 7001_ $$0P:(DE-Juel1)179423$$aLahnakoski, J. M.$$b8 001043079 7001_ $$0P:(DE-HGF)0$$aLove, B. C.$$b9 001043079 7001_ $$0P:(DE-Juel1)185083$$aRaimondo, F.$$b10 001043079 7001_ $$0P:(DE-Juel1)172843$$aPatil, Kaustubh R.$$b11$$eCorresponding author 001043079 773__ $$0PERI:(DE-600)2780218-8$$a10.1186/s40537-025-01193-8$$gVol. 12, no. 1, p. 135$$n1$$p135$$tJournal of Big Data$$v12$$x2196-1115$$y2025 001043079 8564_ $$uhttps://juser.fz-juelich.de/record/1043079/files/Main%20paper.pdf$$yOpenAccess 001043079 8564_ $$uhttps://juser.fz-juelich.de/record/1043079/files/s40537-025-01193-8.pdf$$yOpenAccess 001043079 8767_ $$8SN-2025-00897-b$$92025-08-27$$a1200217075$$d2025-09-16$$eAPC$$jZahlung erfolgt 001043079 909CO $$ooai:juser.fz-juelich.de:1043079$$pVDB$$pdriver$$pOpenAPC$$popen_access$$popenaire$$popenCost$$pdnbdelivery 001043079 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)190306$$aForschungszentrum Jülich$$b0$$kFZJ 001043079 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)190306$$a HHU Düsseldorf, MPI Leipzig$$b0 001043079 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)180537$$aForschungszentrum Jülich$$b1$$kFZJ 001043079 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)177727$$aForschungszentrum Jülich$$b2$$kFZJ 001043079 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)131678$$aForschungszentrum Jülich$$b3$$kFZJ 001043079 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)131678$$a HHU Düsseldorf$$b3 001043079 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)187351$$aForschungszentrum Jülich$$b6$$kFZJ 001043079 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)179423$$aForschungszentrum Jülich$$b8$$kFZJ 001043079 9101_ $$0I:(DE-HGF)0$$6P:(DE-Juel1)179423$$a HHU Düsseldorf$$b8 001043079 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)185083$$aForschungszentrum Jülich$$b10$$kFZJ 001043079 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)172843$$aForschungszentrum Jülich$$b11$$kFZJ 001043079 9131_ $$0G:(DE-HGF)POF4-525$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5254$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vDecoding Brain Organization and Dysfunction$$x0 001043079 9141_ $$y2025 001043079 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2025-01-06 001043079 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2025-01-06 001043079 915__ $$0StatID:(DE-HGF)1160$$2StatID$$aDBCoverage$$bCurrent Contents - Engineering, Computing and Technology$$d2025-01-06 001043079 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0 001043079 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bJ BIG DATA-GER : 2022$$d2025-01-06 001043079 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2024-04-10T15:40:52Z 001043079 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2024-04-10T15:40:52Z 001043079 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2025-01-06 001043079 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2025-01-06 001043079 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2025-01-06 001043079 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess 001043079 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bDOAJ : Anonymous peer review$$d2024-04-10T15:40:52Z 001043079 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2025-01-06 001043079 915__ $$0StatID:(DE-HGF)9905$$2StatID$$aIF >= 5$$bJ BIG DATA-GER : 2022$$d2025-01-06 001043079 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2025-01-06 001043079 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2025-01-06 001043079 915pc $$0PC:(DE-HGF)0000$$2APC$$aAPC keys set 001043079 915pc $$0PC:(DE-HGF)0001$$2APC$$aLocal Funding 001043079 915pc $$0PC:(DE-HGF)0002$$2APC$$aDFG OA Publikationskosten 001043079 915pc $$0PC:(DE-HGF)0003$$2APC$$aDOAJ Journal 001043079 915pc $$0PC:(DE-HGF)0113$$2APC$$aDEAL: Springer Nature 2020 001043079 920__ $$lyes 001043079 9201_ $$0I:(DE-Juel1)INM-7-20090406$$kINM-7$$lGehirn & Verhalten$$x0 001043079 9801_ $$aFullTexts 001043079 980__ $$ajournal 001043079 980__ $$aVDB 001043079 980__ $$aUNRESTRICTED 001043079 980__ $$aI:(DE-Juel1)INM-7-20090406 001043079 980__ $$aAPC