Rank Selection in Non-negative Matrix Factorization: systematic comparison and a new MAD metric

Muzzarelli, Laura; Eickhoff, Simon; Weis, Susanne; Patil, Kaustubh
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@INPROCEEDINGS{Muzzarelli:861439,
      author       = {Muzzarelli, Laura and Weis, Susanne and Eickhoff, Simon and
                      Patil, Kaustubh},
      title        = {{R}ank {S}election in {N}on-negative {M}atrix
                      {F}actorization: systematic comparison and a new {MAD}
                      metric},
      reportid     = {FZJ-2019-01911},
      pages        = {7},
      year         = {2019},
      note         = {This study was partly supported by the Helmholtz Portfolio
                      Theme "Supercomputing and Modeling for the Human Brain" and
                      the European Union’s Horizon 2020 Research and Innovation
                      Programme under Grant Agreement No. 785907 (HBP SGA2).},
      abstract     = {Abstract—Non-Negative Matrix Factorization (NMF) is a
                      powerful dimensionality reduction and factorization method
                      that provides a part-based representation of the data. In
                      the absence of a priori knowledge about the latent
                      dimensionality of the data, it is necessary to select a rank
                      of the reduced representation. Several rank selection
                      methods have been proposed, but no consensus exists on when
                      a method is suitable to use. In this work, we propose a new
                      metric for rank selection based on imputation
                      cross-validation, and we systematically compare it against
                      six other metrics while assessing the effects of data
                      properties. Using synthetic datasets with different
                      properties, our work critically evidences that most methods
                      fail to identify the true rank. We show that properties of
                      the data heavily impact the ability of different methods.
                      Imputation-based metrics, including our new MADimput,
                      provided the best accuracy irrespective of the data type,
                      but no solution worked perfectly in all circumstances. One
                      should therefore carefully assess characteristics of their
                      dataset in order to identify the most suitable metric for
                      rank selection. Keywords— non-negative matrix
                      factorization, rank selection, cross-validation.},
      month         = {Jul},
      date          = {2019-07-14},
      organization  = {2019 International Joint Conference on
                       Neural Networks, Budapest (Hungary), 14
                       Jul 2019 - 19 Jul 2019},
      cin          = {INM-7},
      cid          = {I:(DE-Juel1)INM-7-20090406},
      pnm          = {574 - Theory, modelling and simulation (POF3-574) / SMHB -
                      Supercomputing and Modelling for the Human Brain
                      (HGF-SMHB-2013-2017) / HBP SGA2 - Human Brain Project
                      Specific Grant Agreement 2 (785907)},
      pid          = {G:(DE-HGF)POF3-574 / G:(DE-Juel1)HGF-SMHB-2013-2017 /
                      G:(EU-Grant)785907},
      typ          = {PUB:(DE-HGF)8},
      url          = {https://juser.fz-juelich.de/record/861439},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help