Empirical Comparison between Cross-Validation and Mutation-Validation in Model Selection

Yu, Jinyang; Morrison, Abigail; Hamdan, Sami; Sasse, Leonard; Patil, Kaustubh R.
doi:10.48550/ARXIV.2311.14079
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@ARTICLE{Yu:1021988,
      author       = {Yu, Jinyang and Hamdan, Sami and Sasse, Leonard and
                      Morrison, Abigail and Patil, Kaustubh R.},
      title        = {{E}mpirical {C}omparison between {C}ross-{V}alidation and
                      {M}utation-{V}alidation in {M}odel {S}election},
      publisher    = {arXiv},
      reportid     = {FZJ-2024-01127},
      year         = {2023},
      abstract     = {Mutation validation (MV) is a recently proposed approach
                      for model selection, garnering significant interest due to
                      its unique characteristics and potential benefits compared
                      to the widely used cross-validation (CV) method. In this
                      study, we empirically compared MV and $k$-fold CV using
                      benchmark and real-world datasets. By employing Bayesian
                      tests, we compared generalization estimates yielding three
                      posterior probabilities: practical equivalence, CV
                      superiority, and MV superiority. We also evaluated the
                      differences in the capacity of the selected models and
                      computational efficiency. We found that both MV and CV
                      select models with practically equivalent generalization
                      performance across various machine learning algorithms and
                      the majority of benchmark datasets. MV exhibited advantages
                      in terms of selecting simpler models and lower computational
                      costs. However, in some cases MV selected overly simplistic
                      models leading to underfitting and showed instability in
                      hyperparameter selection. These limitations of MV became
                      more evident in the evaluation of a real-world
                      neuroscientific task of predicting sex at birth using brain
                      functional connectivity.},
      keywords     = {Machine Learning (cs.LG) (Other) / Machine Learning
                      (stat.ML) (Other) / FOS: Computer and information sciences
                      (Other)},
      cin          = {INM-7},
      cid          = {I:(DE-Juel1)INM-7-20090406},
      pnm          = {5254 - Neuroscientific Data Analytics and AI (POF4-525)},
      pid          = {G:(DE-HGF)POF4-5254},
      typ          = {PUB:(DE-HGF)25},
      doi          = {10.48550/ARXIV.2311.14079},
      url          = {https://juser.fz-juelich.de/record/1021988},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help