TY  - JOUR
AU  - Chen, Zhiyi
AU  - Hu, Bowen
AU  - Liu, Xuerong
AU  - Becker, Benjamin
AU  - Eickhoff, Simon B.
AU  - Miao, Kuan
AU  - Gu, Xingmei
AU  - Tang, Yancheng
AU  - Dai, Xin
AU  - Li, Chao
AU  - Leonov, Artemiy
AU  - Xiao, Zhibing
AU  - Feng, Zhengzhi
AU  - Chen, Ji
AU  - Chuan-Peng, Hu
TI  - Sampling inequalities affect generalization of neuroimaging-based diagnostic classifiers in psychiatry
JO  - BMC medicine
VL  - 21
IS  - 1
SN  - 1741-7015
CY  - Heidelberg [u.a.]
PB  - Springer
M1  - FZJ-2023-03482
SP  - 241
PY  - 2023
N1  - This work was supported by the PLA Key Research Foundation (CWS20J007), PLA Talent Program Foundation (2022160258), the STI2030-Major Projects (No. 2022ZD0214000), the National Key R&D Program of China (No. 2021YFC2502200) and the National Natural Science Foundation of China (No. 82201658).
AB  - AbstractBackground The development of machine learning models for aiding in the diagnosis of mental disorder is rec‑ognized as a significant breakthrough in the field of psychiatry. However, clinical practice of such models remains achallenge, with poor generalizability being a major limitation.Methods Here, we conducted a pre‑registered meta‑research assessment on neuroimaging‑based models in thepsychiatric literature, quantitatively examining global and regional sampling issues over recent decades, from a viewthat has been relatively underexplored. A total of 476 studies (n = 118,137) were included in the current assessment.Based on these findings, we built a comprehensive 5‑star rating system to quantitatively evaluate the quality of exist‑ing machine learning models for psychiatric diagnoses.Results A global sampling inequality in these models was revealed quantitatively (sampling Gini coefficient(G) = 0.81, p < .01), varying across different countries (regions) (e.g., China, G = 0.47; the USA, G = 0.58; Germany,G = 0.78; the UK, G = 0.87). Furthermore, the severity of this sampling inequality was significantly predicted by nationaleconomic levels (β = − 2.75, p < .001, R2adj = 0.40; r = − .84, 95% CI: − .41 to − .97), and was plausibly predictable formodel performance, with higher sampling inequality for reporting higher classification accuracy. Further analysesshowed that lack of independent testing (84.24% of models, 95% CI: 81.0–87.5%), improper cross‑validation (51.68%of models, 95% CI: 47.2–56.2%), and poor technical transparency (87.8% of models, 95% CI: 84.9–90.8%)/availability(80.88% of models, 95% CI: 77.3–84.4%) are prevailing in current diagnostic classifiers despite improvements overtime. Relating to these observations, model performances were found decreased in studies with independent cross‑country sampling validations (all p < .001, BF10 > 15). In light of this, we proposed a purpose‑built quantitative assess‑ment checklist, which demonstrated that the overall ratings of these models increased by publication year but werenegatively associated with model performance.Conclusions Together, improving sampling economic equality and hence the quality of machine learning modelsmay be a crucial facet to plausibly translating neuroimaging‑based diagnostic classifiers into clinical practice.Keywords Psychiatric machine learning, Diagnostic classification, Meta‑analysis, Neuroimaging, Sampling inequalities
LB  - PUB:(DE-HGF)16
C6  - 37400814
UR  - <Go to ISI:>//WOS:001022895400003
DO  - DOI:10.1186/s12916-023-02941-4
UR  - https://juser.fz-juelich.de/record/1014812
ER  -