Preprint FZJ-2023-04636

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
On Leakage in Machine Learning Pipelines

 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;  ;

2023
arXiv

arXiv () [10.48550/ARXIV.2311.04179]

This record in other databases:

Please use a persistent id in citations: doi:  doi:

Abstract: Machine learning (ML) provides powerful tools for predictive modeling. ML's popularity stems from the promise of sample-level prediction with applications across a variety of fields from physics and marketing to healthcare. However, if not properly implemented and evaluated, ML pipelines may contain leakage typically resulting in overoptimistic performance estimates and failure to generalize to new data. This can have severe negative financial and societal implications. Our aim is to expand understanding associated with causes leading to leakage when designing, implementing, and evaluating ML pipelines. Illustrated by concrete examples, we provide a comprehensive overview and discussion of various types of leakage that may arise in ML pipelines.

Keyword(s): Machine Learning (cs.LG) ; Artificial Intelligence (cs.AI) ; FOS: Computer and information sciences


Contributing Institute(s):
  1. Gehirn & Verhalten (INM-7)
Research Program(s):
  1. 5254 - Neuroscientific Data Analytics and AI (POF4-525) (POF4-525)

Appears in the scientific report 2023
Database coverage:
OpenAccess
Click to display QR Code for this record

The record appears in these collections:
Institute Collections > INM > INM-7
Document types > Reports > Preprints
Workflow collections > Public records
Publications database
Open Access

 Record created 2023-11-20, last modified 2023-11-28


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)