Systematic misestimation of machine learning performance in neuroimaging studies of depression

Flint, Claas; Redlich, Ronny; Arolt, Volker; Hahn, Tim; Eickhoff, Simon B.; Opel, Nils; Krug, Axel; Leenings, Ramona; Clark, Scott; Dannlowski, Udo; Kircher, Tilo; Winter, Nils R.; Jiang, Xiaoyi; Baune, Bernhard T.; Mehler, David M. A.; Cearns, Micah; Nenadic, Igor; Emden, Daniel
doi:10.1038/s41386-021-01020-7
000892632 001__ 892632
000892632 005__ 20230515091803.0
000892632 0247_ $$2doi$$a10.1038/s41386-021-01020-7
000892632 0247_ $$2ISSN$$a0893-133X
000892632 0247_ $$2ISSN$$a1740-634X
000892632 0247_ $$2Handle$$a2128/28282
000892632 0247_ $$2altmetric$$aaltmetric:105599429
000892632 0247_ $$2pmid$$a33958703
000892632 0247_ $$2WOS$$aWOS:000647877800001
000892632 037__ $$aFZJ-2021-02221
000892632 082__ $$a610
000892632 1001_ $$00000-0001-5164-8227$$aFlint, Claas$$b0
000892632 245__ $$aSystematic misestimation of machine learning performance in neuroimaging studies of depression
000892632 260__ $$aBasingstoke$$bNature Publishing Group$$c2021
000892632 3367_ $$2DRIVER$$aarticle
000892632 3367_ $$2DataCite$$aOutput Types/Journal article
000892632 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1626785232_8199
000892632 3367_ $$2BibTeX$$aARTICLE
000892632 3367_ $$2ORCID$$aJOURNAL_ARTICLE
000892632 3367_ $$00$$2EndNote$$aJournal Article
000892632 520__ $$aWe currently observe a disconcerting phenomenon in machine learning studies in psychiatry: While we would expect larger samples to yield better results due to the availability of more data, larger machine learning studies consistently show much weaker performance than the numerous small-scale studies. Here, we systematically investigated this effect focusing on one of the most heavily studied questions in the field, namely the classification of patients suffering from Major Depressive Disorder (MDD) and healthy controls based on neuroimaging data. Drawing upon structural MRI data from a balanced sample of N = 1868 MDD patients and healthy controls from our recent international Predictive Analytics Competition (PAC), we first trained and tested a classification model on the full dataset which yielded an accuracy of 61%. Next, we mimicked the process by which researchers would draw samples of various sizes (N = 4 to N = 150) from the population and showed a strong risk of misestimation. Specifically, for small sample sizes (N = 20), we observe accuracies of up to 95%. For medium sample sizes (N = 100) accuracies up to 75% were found. Importantly, further investigation showed that sufficiently large test sets effectively protect against performance misestimation whereas larger datasets per se do not. While these results question the validity of a substantial part of the current literature, we outline the relatively low-cost remedy of larger test sets, which is readily available in most cases.
000892632 536__ $$0G:(DE-HGF)POF4-525$$a525 - Decoding Brain Organization and Dysfunction (POF4-525)$$cPOF4-525$$fPOF IV$$x0
000892632 542__ $$2Crossref$$i2021-05-06$$uhttps://creativecommons.org/licenses/by/4.0
000892632 542__ $$2Crossref$$i2021-05-06$$uhttps://creativecommons.org/licenses/by/4.0
000892632 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de
000892632 7001_ $$00000-0002-3353-8566$$aCearns, Micah$$b1
000892632 7001_ $$0P:(DE-HGF)0$$aOpel, Nils$$b2
000892632 7001_ $$0P:(DE-HGF)0$$aRedlich, Ronny$$b3
000892632 7001_ $$0P:(DE-HGF)0$$aMehler, David M. A.$$b4
000892632 7001_ $$0P:(DE-HGF)0$$aEmden, Daniel$$b5
000892632 7001_ $$0P:(DE-HGF)0$$aWinter, Nils R.$$b6
000892632 7001_ $$0P:(DE-HGF)0$$aLeenings, Ramona$$b7
000892632 7001_ $$0P:(DE-Juel1)131678$$aEickhoff, Simon B.$$b8
000892632 7001_ $$0P:(DE-HGF)0$$aKircher, Tilo$$b9
000892632 7001_ $$00000-0002-0564-2497$$aKrug, Axel$$b10
000892632 7001_ $$0P:(DE-HGF)0$$aNenadic, Igor$$b11
000892632 7001_ $$0P:(DE-HGF)0$$aArolt, Volker$$b12
000892632 7001_ $$0P:(DE-HGF)0$$aClark, Scott$$b13
000892632 7001_ $$0P:(DE-HGF)0$$aBaune, Bernhard T.$$b14
000892632 7001_ $$0P:(DE-HGF)0$$aJiang, Xiaoyi$$b15
000892632 7001_ $$0P:(DE-HGF)0$$aDannlowski, Udo$$b16$$eCorresponding author
000892632 7001_ $$0P:(DE-HGF)0$$aHahn, Tim$$b17
000892632 77318 $$2Crossref$$3journal-article$$a10.1038/s41386-021-01020-7$$bSpringer Science and Business Media LLC$$d2021-05-06$$n8$$p1510-1517$$tNeuropsychopharmacology$$v46$$x0893-133X$$y2021
000892632 773__ $$0PERI:(DE-600)2008300-2$$a10.1038/s41386-021-01020-7$$n8$$p1510-1517$$tNeuropsychopharmacology$$v46$$x0893-133X$$y2021
000892632 8564_ $$uh
000892632 8564_ $$uhttps://juser.fz-juelich.de/record/892632/files/s41386-021-01020-7-1.pdf$$yOpenAccess
000892632 909CO $$ooai:juser.fz-juelich.de:892632$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
000892632 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)131678$$aForschungszentrum Jülich$$b8$$kFZJ
000892632 9131_ $$0G:(DE-HGF)POF4-525$$1G:(DE-HGF)POF4-520$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lNatural, Artificial and Cognitive Information Processing$$vDecoding Brain Organization and Dysfunction$$x0
000892632 9130_ $$0G:(DE-HGF)POF3-574$$1G:(DE-HGF)POF3-570$$2G:(DE-HGF)POF3-500$$3G:(DE-HGF)POF3$$4G:(DE-HGF)POF$$aDE-HGF$$bKey Technologies$$lDecoding the Human Brain$$vTheory, modelling and simulation$$x0
000892632 9141_ $$y2021
000892632 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS
000892632 915__ $$0StatID:(DE-HGF)1030$$2StatID$$aDBCoverage$$bCurrent Contents - Life Sciences
000892632 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
000892632 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search
000892632 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bNEUROPSYCHOPHARMACOL : 2015
000892632 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection
000892632 915__ $$0StatID:(DE-HGF)0110$$2StatID$$aWoS$$bScience Citation Index
000892632 915__ $$0StatID:(DE-HGF)0111$$2StatID$$aWoS$$bScience Citation Index Expanded
000892632 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
000892632 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC
000892632 915__ $$0StatID:(DE-HGF)9905$$2StatID$$aIF >= 5$$bNEUROPSYCHOPHARMACOL : 2015
000892632 915__ $$0StatID:(DE-HGF)0310$$2StatID$$aDBCoverage$$bNCBI Molecular Biology Database
000892632 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews
000892632 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline
000892632 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bThomson Reuters Master Journal List
000892632 920__ $$lyes
000892632 9201_ $$0I:(DE-Juel1)INM-7-20090406$$kINM-7$$lGehirn & Verhalten$$x0
000892632 980__ $$ajournal
000892632 980__ $$aVDB
000892632 980__ $$aUNRESTRICTED
000892632 980__ $$aI:(DE-Juel1)INM-7-20090406
000892632 9801_ $$aFullTexts
000892632 999C5 $$1AM Darcy$$2Crossref$$9-- missing cx lookup --$$a10.1001/jama.2015.18421$$p551 -$$tJ Am Med Assoc$$uDarcy AM, Louie AK, Roberts LW. Machine learning and the profession of medicine. J Am Med Assoc. 2016;315:551–52.$$v315$$y2016
000892632 999C5 $$1HA Eyre$$2Crossref$$9-- missing cx lookup --$$a10.1002/wps.20297$$p21 -$$tWorld Psychiatry$$uEyre HA, Singh AB, Reynolds C. Tech giants enter mental health. World Psychiatry. 2016;15:21–22.$$v15$$y2016
000892632 999C5 $$1JDE Gabrieli$$2Crossref$$9-- missing cx lookup --$$a10.1016/j.neuron.2014.10.047$$p11 -$$tNeuron.$$uGabrieli JDE, Ghosh SS, Whitfield-Gabrieli S. Prediction as a humanitarian and pragmatic contribution from human cognitive neuroscience. Neuron. 2015;85:11–26.$$v85$$y2015
000892632 999C5 $$1MI Jordan$$2Crossref$$9-- missing cx lookup --$$a10.1126/science.aaa8415$$p255 -$$tScience.$$uJordan MI, Mitchell TM. Machine learning: Trends, perspectives, and prospects. Science. 2015;349:255–60.$$v349$$y2015
000892632 999C5 $$1T Hahn$$2Crossref$$9-- missing cx lookup --$$a10.1038/mp.2016.201$$p37 -$$tMol Psychiatry.$$uHahn T, Nierenberg AA, Whitfield-Gabrieli S. Predictive analytics in mental health: applications, guidelines, challenges and perspectives. Mol Psychiatry. 2017;22:37–43.$$v22$$y2017
000892632 999C5 $$1BA Johnston$$2Crossref$$uJohnston BA, Steele JD, Tolomeo S, Christmas D, Matthews K. Structural MRI-based predictions in patients with treatment-refractory depression (TRD). PLoS One. 2015;10:1–16.$$y2015
000892632 999C5 $$1B Mwangi$$2Crossref$$9-- missing cx lookup --$$a10.1093/brain/aws084$$p1508 -$$tBrain.$$uMwangi B, Ebmeier KP, Matthews K, Douglas Steele J. Multi-centre diagnostic classification of individual structural neuroimaging scans from patients with major depressive disorder. Brain. 2012;135:1508–21.$$v135$$y2012
000892632 999C5 $$1MJ Patel$$2Crossref$$9-- missing cx lookup --$$a10.1002/gps.4262$$p1056 -$$tInt J Geriatr Psychiatry.$$uPatel MJ, Andreescu C, Price JC, Edelman KL, Reynolds CF, Aizenstein HJ. Machine learning approaches for integrating clinical and imaging features in late-life depression classification and response prediction. Int J Geriatr Psychiatry. 2015;30:1056–67.$$v30$$y2015
000892632 999C5 $$1AH Neuhaus$$2Crossref$$9-- missing cx lookup --$$a10.1016/j.biopsych.2017.09.032$$pe81 -$$tBiol Psychiatry.$$uNeuhaus AH, Popescu FC. Sample Size, Model Robustness, and Classification Accuracy in Diagnostic Multivariate Neuroimaging Analyses. Biol Psychiatry. 2018;84:e81–e82.$$v84$$y2018
000892632 999C5 $$1MR Arbabshirani$$2Crossref$$9-- missing cx lookup --$$a10.1016/j.neuroimage.2016.02.079$$p137 -$$tNeuroimage.$$uArbabshirani MR, Plis S, Sui J, Calhoun VD. Single subject prediction of brain disorders in neuroimaging: Promises and pitfalls. Neuroimage. 2017;145:137–65.$$v145$$y2017
000892632 999C5 $$1S Raudys$$2Crossref$$9-- missing cx lookup --$$a10.1109/34.75512$$p252 -$$tIEEE Trans Pattern Anal Mach Intell$$uRaudys S, Jain A. Small Sample Size Effects in Statistical Pattern Recognition: Recommendations for Practitioners. IEEE Trans Pattern Anal Mach Intell. 1991;13:252–64.$$v13$$y1991
000892632 999C5 $$1T van der Ploeg$$2Crossref$$9-- missing cx lookup --$$a10.1186/1471-2288-14-137$$tBMC Med Res Methodol$$uvan der Ploeg T, Austin PC, Steyerberg EW. Modern modelling techniques are data hungry: a simulation study for predicting dichotomous endpoints. BMC Med Res Methodol. 2014;14:137.$$v14$$y2014
000892632 999C5 $$1J Kambeitz$$2Crossref$$9-- missing cx lookup --$$a10.1016/j.biopsych.2016.10.028$$p330 -$$tBiol Psychiatry.$$uKambeitz J, Cabral C, Sacchet MD, Gotlib IH, Zahn R, Serpa MH, et al. Detecting Neuroimaging Biomarkers for Depression: A Meta-analysis of Multivariate Pattern Recognition Studies. Biol Psychiatry. 2017;82:330–38.$$v82$$y2017
000892632 999C5 $$1G Varoquaux$$2Crossref$$9-- missing cx lookup --$$a10.1016/j.neuroimage.2016.10.038$$p166 -$$tNeuroimage.$$uVaroquaux G, Raamana PR, Engemann DA, Hoyos-Idrobo A, Schwartz Y, Thirion B. Assessing and tuning brain decoders: cross-validation, caveats, and guidelines. Neuroimage. 2017;145:166–79.$$v145$$y2017
000892632 999C5 $$2Crossref$$9-- missing cx lookup --$$a10.31219/OSF.IO/UZEHJ$$uHahn T, Ebner-Priemer U, Meyer-Lindenberg A Transparent Artificial Intelligence – A Conceptual Framework for Evaluating AI-based Clinical Decision Support Systems. OSF Prepr. 2019. 2019. https://doi.org/10.31219/OSF.IO/UZEHJ.
000892632 999C5 $$1G Varoquaux$$2Crossref$$9-- missing cx lookup --$$a10.1016/j.neuroimage.2017.06.061$$p68 -$$tNeuroimage.$$uVaroquaux G. Cross-validation failure: small sample sizes lead to large error bars. Neuroimage. 2018;180:68–77.$$v180$$y2018
000892632 999C5 $$1U Dannlowski$$2Crossref$$9-- missing cx lookup --$$a10.1038/npp.2015.86$$p2510 -$$tNeuropsychopharmacology.$$uDannlowski U, Kugel H, Grotegerd D, Redlich R, Suchy J, Opel N, et al. NCAN cross-disorder risk variant is associated with limbic gray matter deficits in healthy subjects and major depression. Neuropsychopharmacology. 2015;40:2510–16.$$v40$$y2015
000892632 999C5 $$1U Dannlowski$$2Crossref$$9-- missing cx lookup --$$a10.1038/mp.2014.39$$p398 -$$tMol Psychiatry.$$uDannlowski U, Grabe HJ, Wittfeld K, Klaus J, Konrad C, Grotegerd D, et al. Multimodal imaging of a tescalcin (TESC)-regulating polymorphism (rs7294919)-specific effects on hippocampal gray matter structure. Mol Psychiatry. 2015;20:398–404.$$v20$$y2015
000892632 999C5 $$2Crossref$$9-- missing cx lookup --$$a10.1007/s00406-018-0943-x$$uKircher T, Wöhr M, Nenadic I, Schwarting R, Schratt G, Alferink J, et al. Neurobiology of the major psychoses: a translational perspective on brain structure and function—the FOR2107 consortium. Eur Arch Psychiatry Clin Neurosci. 2018:1–14.
000892632 999C5 $$2Crossref$$uWittchen H-U, Wunderlich U, Gruschwitz S, Zaudig M SKID I. Strukturiertes Klinisches Interview für DSM-IV. Achse I: Psychische Störungen. Interviewheft und Beurteilungsheft. Eine deutschsprachige, erweiterte Bearb. d. amerikanischen Originalversion des SKID I. Göttingen: Hogrefe; 1997.
000892632 999C5 $$1C Vogelbacher$$2Crossref$$9-- missing cx lookup --$$a10.1016/j.neuroimage.2018.01.079$$p450 -$$tNeuroimage.$$uVogelbacher C, Möbius TWD, Sommer J, Schuster V, Dannlowski U, Kircher T, et al. The Marburg-Münster Affective Disorders Cohort Study (MACS): A quality assurance protocol for MR neuroimaging data. Neuroimage. 2018;172:450–460.$$v172$$y2018
000892632 999C5 $$1F Pedregosa$$2Crossref$$uPedregosa F, Varoquaux G, Gramfort A, Michel V, Thirion B, Grisel O, et al. Scikit-learn: Machine Learning in Python. J Mach Learn Res. 2012;12:2825–30.$$y2012
000892632 999C5 $$1AF Marquand$$2Crossref$$9-- missing cx lookup --$$a10.1016/j.biopsych.2015.12.023$$p552 -$$tBiol Psychiatry.$$uMarquand AF, Rezek I, Buitelaar J, Beckmann CF. Understanding heterogeneity in clinical cohorts using normative models: beyond case-control studies. Biol Psychiatry. 2016;80:552–61.$$v80$$y2016
000892632 999C5 $$1HG Schnack$$2Crossref$$9-- missing cx lookup --$$a10.3389/fpsyt.2016.00050$$p1 -$$tFront Psychiatry$$uSchnack HG, Kahn RS. Detecting neuroimaging biomarkers for psychiatric disorders: sample size matters. Front Psychiatry. 2016;7:1–12.$$v7$$y2016
000892632 999C5 $$1E Combrisson$$2Crossref$$9-- missing cx lookup --$$a10.1016/j.jneumeth.2015.01.010$$p126 -$$tJ Neurosci Methods.$$uCombrisson E, Jerbi K. Exceeding chance level by chance: the caveat of theoretical chance levels in brain signal classification and statistical assessment of decoding accuracy. J Neurosci Methods. 2015;250:126–36.$$v250$$y2015
Gast :: Anmelden JuSER
		Suchen		Absenden		Personalisieren Ihre Benachrichtigungen Ihre Körbe Ihre Suchanfragen		Hilfe