Computational simulation of virtual patients reduces dataset bias and improves machine learning-based detection of ARDS from noisy heterogeneous ICU datasets

Sharafutdinov, Konstantin; Hardman, Jonathan G.; Bickenbach, Johannes; Schuppert, Andreas; Fritsch, Sebastian Johannes; Mayer, Hannah; Ghalati, Pejman Farhadi; Polzin, Richard; Iravani, Mina; Marx, Gernot; Bates, Declan G.; Saffaran, Sina
doi:10.1109/OJEMB.2023.3243190
001005293 001__ 1005293
001005293 005__ 20250203103307.0
001005293 0247_ $$2doi$$a10.1109/OJEMB.2023.3243190
001005293 0247_ $$2datacite_doi$$a10.34734/FZJ-2023-01408
001005293 0247_ $$2pmid$$a39184970
001005293 0247_ $$2WOS$$aWOS:001294340500001
001005293 037__ $$aFZJ-2023-01408
001005293 082__ $$a570
001005293 1001_ $$0P:(DE-HGF)0$$aSharafutdinov, Konstantin$$b0$$eCorresponding author
001005293 245__ $$aComputational simulation of virtual patients reduces dataset bias and improves machine learning-based detection of ARDS from noisy heterogeneous ICU datasets
001005293 260__ $$aNew York, NY$$bIEEE$$c2023
001005293 3367_ $$2DRIVER$$aarticle
001005293 3367_ $$2DataCite$$aOutput Types/Journal article
001005293 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1724396888_22840
001005293 3367_ $$2BibTeX$$aARTICLE
001005293 3367_ $$2ORCID$$aJOURNAL_ARTICLE
001005293 3367_ $$00$$2EndNote$$aJournal Article
001005293 520__ $$aGoal: Machine learning (ML) technologies that leverage large-scale patient data are promising tools predictingdisease evolution in individual patients. However, the limited generalizability of ML models developed on single-center datasets,and their unproven performance in real-world settings, remain significant constraints to their widespread adoption in clinicalpractice. One approach to tackle this issue is to base learning on large multi-center datasets. However, such heterogeneous datasetscan introduce further biases driven by data origin, as data structures and patient cohorts may differ between hospitals. Methods: Inthis paper, we demonstrate how mechanistic virtual patient (VP) modeling can be used to capture specific features of patients’states and dynamics, while reducing biases introduced by heterogeneous datasets. We show how VP modeling can be used for dataaugmentation through identification of individualized model parameters approximating disease states of patients with suspectedacute respiratory distress syndrome (ARDS) from observational data of mixed origin. We compare the results of an unsupervisedlearning method (clustering) in two cases: where the learning is based on original patient data and on data derived in the matchingprocedure of the VP model to real patient data. Results: More robust cluster configurations were observed in clustering using themodel-derived data. VP model-based clustering also reduced biases introduced by the inclusion of data from different hospitalsand was able to discover an additional cluster with significant ARDS enrichment. Conclusions: Our results indicate thatmechanistic VP modeling can be used to significantly reduce biases introduced by learning from heterogeneous datasets and toallow improved discovery of patient cohorts driven exclusively by medical conditions.
001005293 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001005293 536__ $$0G:(BMBF)01ZZ1803M$$aSMITH - Medizininformatik-Konsortium - Beitrag Forschungszentrum Jülich (01ZZ1803M)$$c01ZZ1803M$$x1
001005293 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de
001005293 7001_ $$0P:(DE-Juel1)185651$$aFritsch, Sebastian Johannes$$b1$$ufzj
001005293 7001_ $$0P:(DE-HGF)0$$aIravani, Mina$$b2
001005293 7001_ $$0P:(DE-HGF)0$$aGhalati, Pejman Farhadi$$b3
001005293 7001_ $$0P:(DE-HGF)0$$aSaffaran, Sina$$b4
001005293 7001_ $$00000-0003-1395-9846$$aBates, Declan G.$$b5
001005293 7001_ $$0P:(DE-HGF)0$$aHardman, Jonathan G.$$b6
001005293 7001_ $$0P:(DE-HGF)0$$aPolzin, Richard$$b7
001005293 7001_ $$0P:(DE-HGF)0$$aMayer, Hannah$$b8
001005293 7001_ $$0P:(DE-HGF)0$$aMarx, Gernot$$b9
001005293 7001_ $$0P:(DE-HGF)0$$aBickenbach, Johannes$$b10
001005293 7001_ $$0P:(DE-HGF)0$$aSchuppert, Andreas$$b11
001005293 773__ $$0PERI:(DE-600)3012072-X$$a10.1109/OJEMB.2023.3243190$$gp. 1 - 11$$p611 - 620$$tIEEE open journal of engineering in medicine and biology$$v5$$x2644-1276$$y2023
001005293 8564_ $$uhttps://juser.fz-juelich.de/record/1005293/files/Computational_Simulation_of_Virtual_Patients_Reduces_Dataset_Bias_and_Improves_Machine_Learning-Based_Detection_of_ARDS_from_Noisy_Heterogeneous_ICU_Datasets.pdf$$yOpenAccess
001005293 8564_ $$uhttps://juser.fz-juelich.de/record/1005293/files/Computational_Simulation_of_Virtual_Patients_Reduces_Dataset_Bias_and_Improves_Machine_Learning-Based_Detection_of_ARDS_from_Noisy_Heterogeneous_ICU_Datasets.gif?subformat=icon$$xicon$$yOpenAccess
001005293 8564_ $$uhttps://juser.fz-juelich.de/record/1005293/files/Computational_Simulation_of_Virtual_Patients_Reduces_Dataset_Bias_and_Improves_Machine_Learning-Based_Detection_of_ARDS_from_Noisy_Heterogeneous_ICU_Datasets.jpg?subformat=icon-1440$$xicon-1440$$yOpenAccess
001005293 8564_ $$uhttps://juser.fz-juelich.de/record/1005293/files/Computational_Simulation_of_Virtual_Patients_Reduces_Dataset_Bias_and_Improves_Machine_Learning-Based_Detection_of_ARDS_from_Noisy_Heterogeneous_ICU_Datasets.jpg?subformat=icon-180$$xicon-180$$yOpenAccess
001005293 8564_ $$uhttps://juser.fz-juelich.de/record/1005293/files/Computational_Simulation_of_Virtual_Patients_Reduces_Dataset_Bias_and_Improves_Machine_Learning-Based_Detection_of_ARDS_from_Noisy_Heterogeneous_ICU_Datasets.jpg?subformat=icon-640$$xicon-640$$yOpenAccess
001005293 909CO $$ooai:juser.fz-juelich.de:1005293$$pdnbdelivery$$pdriver$$pVDB$$popen_access$$popenaire
001005293 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)185651$$aForschungszentrum Jülich$$b1$$kFZJ
001005293 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001005293 9141_ $$y2024
001005293 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2022-11-26
001005293 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews$$d2022-11-26
001005293 915__ $$0StatID:(DE-HGF)1190$$2StatID$$aDBCoverage$$bBiological Abstracts$$d2022-11-26
001005293 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
001005293 915__ $$0StatID:(DE-HGF)0112$$2StatID$$aWoS$$bEmerging Sources Citation Index$$d2022-11-26
001005293 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2021-01-15T11:06:51Z
001005293 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2021-01-15T11:06:51Z
001005293 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2022-11-26
001005293 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2022-11-26
001005293 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001005293 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bDOAJ : Blind peer review$$d2021-01-15T11:06:51Z
001005293 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2022-11-26
001005293 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2022-11-26
001005293 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2022-11-26
001005293 920__ $$lno
001005293 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001005293 980__ $$ajournal
001005293 980__ $$aVDB
001005293 980__ $$aUNRESTRICTED
001005293 980__ $$aI:(DE-Juel1)JSC-20090406
001005293 9801_ $$aFullTexts
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help