Journal Article FZJ-2025-05709

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
The empirical structure of psychopathology is represented in large language models

 ;  ;  ;  ;  ;

2025
Nature Publishing Group UK London

Nature Mental Health 3(12), 1482 - 1492 () [10.1038/s44220-025-00527-y]

This record in other databases:  

Please use a persistent id in citations: doi:  doi:

Abstract: Clinical assessment and scientific research in psychiatry are largely based on questionnaires that are used to assess psychopathology. The development of large language models (LLMs) offers a new perspective for analysis of the language and terminology on which these questionnaires are based. We used state-of-the-art LLMs to derive numerical representations (‘text embeddings’) of the semantic and sentiment content of items from established questionnaires for the assessment of psychopathology. We compared the pairwise associations between empirical data from cross-sectional studies and text embeddings to test whether the empirical structure of psychopathology can be reconstructed by LLMs. Across four large-scale datasets (n = 1,555, n = 1,099, n = 11,807 and n = 39,755), we found a range of significant correlations between empirical item-pair associations and associations derived from text embeddings (r = 0.18 to r = 0.57, all P < 0.05). Random forest regression models based on semantic or sentiment embeddings predicted empirical item-pair associations with moderate to high accuracy (r = 0.33 to r = 0.81, all P < 0.05). Similarly, empirical clustering of items and grouping to established subdomain scores could be partly reconstructed by text embeddings. Our results demonstrate that LLMs are able to represent substantial components of the empirical structure of psychopathology. Consequently, the integration of LLMs into mental health research has the potential to unlock numerous promising avenues. These may encompass improving the process of developing questionnaires, optimizing generalizability and reducing the redundancy of existing questionnaires or facilitating the development of new conceptualizations of mental disorders.

Classification:

Note: The original studies analyzed in this work were supported by the National Institute of Mental Health (Grant R01MH112612) to J.S. and the Deutsche Forschungsgemeinschaft (DFG) ET 31/7-1 to U.E. K.V. was supported within the project SIMSUB (Grant 01GP2215) of the German Ministery of Research, Technology and Space (BMFTR). The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

Contributing Institute(s):
  1. Kognitive Neurowissenschaften (INM-3)
Research Program(s):
  1. 5251 - Multilevel Brain Organization and Variability (POF4-525) (POF4-525)

Appears in the scientific report 2025
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; OpenAccess ; DEAL Nature
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Institute Collections > INM > INM-3
Workflow collections > Public records
Publications database
Open Access

 Record created 2025-12-19, last modified 2025-12-19


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)