001 | 1018240 | ||
005 | 20231128201904.0 | ||
024 | 7 | _ | |a 10.48550/ARXIV.2311.04179 |2 doi |
024 | 7 | _ | |a 10.34734/FZJ-2023-04636 |2 datacite_doi |
037 | _ | _ | |a FZJ-2023-04636 |
100 | 1 | _ | |a Sasse, Leonard |0 P:(DE-Juel1)190306 |b 0 |u fzj |
245 | _ | _ | |a On Leakage in Machine Learning Pipelines |
260 | _ | _ | |c 2023 |b arXiv |
336 | 7 | _ | |a Preprint |b preprint |m preprint |0 PUB:(DE-HGF)25 |s 1701175918_23345 |2 PUB:(DE-HGF) |
336 | 7 | _ | |a WORKING_PAPER |2 ORCID |
336 | 7 | _ | |a Electronic Article |0 28 |2 EndNote |
336 | 7 | _ | |a preprint |2 DRIVER |
336 | 7 | _ | |a ARTICLE |2 BibTeX |
336 | 7 | _ | |a Output Types/Working Paper |2 DataCite |
520 | _ | _ | |a Machine learning (ML) provides powerful tools for predictive modeling. ML's popularity stems from the promise of sample-level prediction with applications across a variety of fields from physics and marketing to healthcare. However, if not properly implemented and evaluated, ML pipelines may contain leakage typically resulting in overoptimistic performance estimates and failure to generalize to new data. This can have severe negative financial and societal implications. Our aim is to expand understanding associated with causes leading to leakage when designing, implementing, and evaluating ML pipelines. Illustrated by concrete examples, we provide a comprehensive overview and discussion of various types of leakage that may arise in ML pipelines. |
536 | _ | _ | |a 5254 - Neuroscientific Data Analytics and AI (POF4-525) |0 G:(DE-HGF)POF4-5254 |c POF4-525 |f POF IV |x 0 |
588 | _ | _ | |a Dataset connected to DataCite |
650 | _ | 7 | |a Machine Learning (cs.LG) |2 Other |
650 | _ | 7 | |a Artificial Intelligence (cs.AI) |2 Other |
650 | _ | 7 | |a FOS: Computer and information sciences |2 Other |
700 | 1 | _ | |a Nicolaisen-Sobesky, Eliana |0 P:(DE-HGF)0 |b 1 |
700 | 1 | _ | |a Dukart, Jürgen |0 P:(DE-Juel1)177727 |b 2 |u fzj |
700 | 1 | _ | |a Eickhoff, Simon B. |0 P:(DE-Juel1)131678 |b 3 |u fzj |
700 | 1 | _ | |a Götz, Michael |0 P:(DE-HGF)0 |b 4 |
700 | 1 | _ | |a Hamdan, Sami |0 P:(DE-Juel1)184874 |b 5 |u fzj |
700 | 1 | _ | |a Komeyer, Vera |0 P:(DE-Juel1)187351 |b 6 |u fzj |
700 | 1 | _ | |a Kulkarni, Abhijit |0 P:(DE-HGF)0 |b 7 |
700 | 1 | _ | |a Lahnakoski, Juha |0 P:(DE-Juel1)179423 |b 8 |u fzj |
700 | 1 | _ | |a Love, Bradley C. |0 P:(DE-HGF)0 |b 9 |
700 | 1 | _ | |a Raimondo, Federico |0 P:(DE-Juel1)185083 |b 10 |u fzj |
700 | 1 | _ | |a Patil, Kaustubh R. |0 P:(DE-Juel1)172843 |b 11 |e Corresponding author |u fzj |
773 | _ | _ | |a 10.48550/ARXIV.2311.04179 |
856 | 4 | _ | |u https://juser.fz-juelich.de/record/1018240/files/on_leakage.pdf |y OpenAccess |
909 | C | O | |o oai:juser.fz-juelich.de:1018240 |p openaire |p open_access |p VDB |p driver |p dnbdelivery |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 0 |6 P:(DE-Juel1)190306 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 2 |6 P:(DE-Juel1)177727 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 3 |6 P:(DE-Juel1)131678 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 5 |6 P:(DE-Juel1)184874 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 6 |6 P:(DE-Juel1)187351 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 8 |6 P:(DE-Juel1)179423 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 10 |6 P:(DE-Juel1)185083 |
910 | 1 | _ | |a Forschungszentrum Jülich |0 I:(DE-588b)5008462-8 |k FZJ |b 11 |6 P:(DE-Juel1)172843 |
913 | 1 | _ | |a DE-HGF |b Key Technologies |l Natural, Artificial and Cognitive Information Processing |1 G:(DE-HGF)POF4-520 |0 G:(DE-HGF)POF4-525 |3 G:(DE-HGF)POF4 |2 G:(DE-HGF)POF4-500 |4 G:(DE-HGF)POF |v Decoding Brain Organization and Dysfunction |9 G:(DE-HGF)POF4-5254 |x 0 |
914 | 1 | _ | |y 2023 |
915 | _ | _ | |a OpenAccess |0 StatID:(DE-HGF)0510 |2 StatID |
920 | _ | _ | |l yes |
920 | 1 | _ | |0 I:(DE-Juel1)INM-7-20090406 |k INM-7 |l Gehirn & Verhalten |x 0 |
980 | _ | _ | |a preprint |
980 | _ | _ | |a VDB |
980 | _ | _ | |a UNRESTRICTED |
980 | _ | _ | |a I:(DE-Juel1)INM-7-20090406 |
980 | 1 | _ | |a FullTexts |
Library | Collection | CLSMajor | CLSMinor | Language | Author |
---|