%0 Journal Article
%A Kambeitz, Joseph
%A Schiffman, Jason
%A Kambeitz-Ilankovic, Lana
%A Mittal, Vijay A.
%A Ettinger, Ulrich
%A Vogeley, Kai
%T The empirical structure of psychopathology is represented in large language models
%J Nature Mental Health
%V 3
%N 12
%@ 2731-6076
%C London
%I Nature Publishing Group UK
%M FZJ-2025-05709
%P 1482 - 1492
%D 2025
%Z The original studies analyzed in this work were supported by the National Institute of Mental Health (Grant R01MH112612) to J.S. and the Deutsche Forschungsgemeinschaft (DFG) ET 31/7-1 to U.E. K.V. was supported within the project SIMSUB (Grant 01GP2215) of the German Ministery of Research, Technology and Space (BMFTR). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.
%X Clinical assessment and scientific research in psychiatry are largely based on questionnaires that are used to assess psychopathology. The development of large language models (LLMs) offers a new perspective for analysis of the language and terminology on which these questionnaires are based. We used state-of-the-art LLMs to derive numerical representations (‘text embeddings’) of the semantic and sentiment content of items from established questionnaires for the assessment of psychopathology. We compared the pairwise associations between empirical data from cross-sectional studies and text embeddings to test whether the empirical structure of psychopathology can be reconstructed by LLMs. Across four large-scale datasets (n = 1,555, n = 1,099, n = 11,807 and n = 39,755), we found a range of significant correlations between empirical item-pair associations and associations derived from text embeddings (r = 0.18 to r = 0.57, all P < 0.05). Random forest regression models based on semantic or sentiment embeddings predicted empirical item-pair associations with moderate to high accuracy (r = 0.33 to r = 0.81, all P < 0.05). Similarly, empirical clustering of items and grouping to established subdomain scores could be partly reconstructed by text embeddings. Our results demonstrate that LLMs are able to represent substantial components of the empirical structure of psychopathology. Consequently, the integration of LLMs into mental health research has the potential to unlock numerous promising avenues. These may encompass improving the process of developing questionnaires, optimizing generalizability and reducing the redundancy of existing questionnaires or facilitating the development of new conceptualizations of mental disorders.
%F PUB:(DE-HGF)16
%9 Journal Article
%R 10.1038/s44220-025-00527-y
%U https://juser.fz-juelich.de/record/1049995