001018240 001__ 1018240
001018240 005__ 20231128201904.0
001018240 0247_ $$2doi$$a10.48550/ARXIV.2311.04179
001018240 0247_ $$2datacite_doi$$a10.34734/FZJ-2023-04636
001018240 037__ $$aFZJ-2023-04636
001018240 1001_ $$0P:(DE-Juel1)190306$$aSasse, Leonard$$b0$$ufzj
001018240 245__ $$aOn Leakage in Machine Learning Pipelines
001018240 260__ $$barXiv$$c2023
001018240 3367_ $$0PUB:(DE-HGF)25$$2PUB:(DE-HGF)$$aPreprint$$bpreprint$$mpreprint$$s1701175918_23345
001018240 3367_ $$2ORCID$$aWORKING_PAPER
001018240 3367_ $$028$$2EndNote$$aElectronic Article
001018240 3367_ $$2DRIVER$$apreprint
001018240 3367_ $$2BibTeX$$aARTICLE
001018240 3367_ $$2DataCite$$aOutput Types/Working Paper
001018240 520__ $$aMachine learning (ML) provides powerful tools for predictive modeling. ML's popularity stems from the promise of sample-level prediction with applications across a variety of fields from physics and marketing to healthcare. However, if not properly implemented and evaluated, ML pipelines may contain leakage typically resulting in overoptimistic performance estimates and failure to generalize to new data. This can have severe negative financial and societal implications. Our aim is to expand understanding associated with causes leading to leakage when designing, implementing, and evaluating ML pipelines. Illustrated by concrete examples, we provide a comprehensive overview and discussion of various types of leakage that may arise in ML pipelines.
001018240 536__ $$0G:(DE-HGF)POF4-5254$$a5254 - Neuroscientific Data Analytics and AI (POF4-525)$$cPOF4-525$$fPOF IV$$x0
001018240 588__ $$aDataset connected to DataCite
001018240 650_7 $$2Other$$aMachine Learning (cs.LG)
001018240 650_7 $$2Other$$aArtificial Intelligence (cs.AI)
001018240 650_7 $$2Other$$aFOS: Computer and information sciences
001018240 7001_ $$0P:(DE-HGF)0$$aNicolaisen-Sobesky, Eliana$$b1
001018240 7001_ $$0P:(DE-Juel1)177727$$aDukart, Jürgen$$b2$$ufzj
001018240 7001_ $$0P:(DE-Juel1)131678$$aEickhoff, Simon B.$$b3$$ufzj
001018240 7001_ $$0P:(DE-HGF)0$$aGötz, Michael$$b4
001018240 7001_ $$0P:(DE-Juel1)184874$$aHamdan, Sami$$b5$$ufzj
001018240 7001_ $$0P:(DE-Juel1)187351$$aKomeyer, Vera$$b6$$ufzj
001018240 7001_ $$0P:(DE-HGF)0$$aKulkarni, Abhijit$$b7
001018240 7001_ $$0P:(DE-Juel1)179423$$aLahnakoski, Juha$$b8$$ufzj
001018240 7001_ $$0P:(DE-HGF)0$$aLove, Bradley C.$$b9
001018240 7001_ $$0P:(DE-Juel1)185083$$aRaimondo, Federico$$b10$$ufzj
001018240 7001_ $$0P:(DE-Juel1)172843$$aPatil, Kaustubh R.$$b11$$eCorresponding author$$ufzj
001018240 773__ $$a10.48550/ARXIV.2311.04179
001018240 8564_ $$uhttps://juser.fz-juelich.de/record/1018240/files/on_leakage.pdf$$yOpenAccess
001018240 909CO $$ooai:juser.fz-juelich.de:1018240$$popenaire$$popen_access$$pVDB$$pdriver$$pdnbdelivery
001018240 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)190306$$aForschungszentrum Jülich$$b0$$kFZJ
001018240 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)177727$$aForschungszentrum Jülich$$b2$$kFZJ
001018240 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)131678$$aForschungszentrum Jülich$$b3$$kFZJ
001018240 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)184874$$aForschungszentrum Jülich$$b5$$kFZJ
001018240 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)187351$$aForschungszentrum Jülich$$b6$$kFZJ
001018240 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)179423$$aForschungszentrum Jülich$$b8$$kFZJ
001018240 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)185083$$aForschungszentrum Jülich$$b10$$kFZJ
001018240 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)172843$$aForschungszentrum Jülich$$b11$$kFZJ
001018240 9131_ $$0G:(DE-HGF)POF4-525$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5254$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vDecoding Brain Organization and Dysfunction$$x0
001018240 9141_ $$y2023
001018240 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001018240 920__ $$lyes
001018240 9201_ $$0I:(DE-Juel1)INM-7-20090406$$kINM-7$$lGehirn & Verhalten$$x0
001018240 980__ $$apreprint
001018240 980__ $$aVDB
001018240 980__ $$aUNRESTRICTED
001018240 980__ $$aI:(DE-Juel1)INM-7-20090406
001018240 9801_ $$aFullTexts