001048464 001__ 1048464
001048464 005__ 20251212202212.0
001048464 0247_ $$2doi$$a10.1371/journal.pcbi.1013679
001048464 0247_ $$2ISSN$$a1553-734X
001048464 0247_ $$2ISSN$$a1553-7358
001048464 0247_ $$2datacite_doi$$a10.34734/FZJ-2025-04662
001048464 037__ $$aFZJ-2025-04662
001048464 082__ $$a610
001048464 1001_ $$0P:(DE-HGF)0$$aFlöge, Klemens$$b0
001048464 245__ $$aOneProt: Towards multi-modal protein foundation models via latent space alignment of sequence, structure, binding sites and text encoders
001048464 260__ $$aSan Francisco, Calif.$$bPublic Library of Science$$c2025
001048464 3367_ $$2DRIVER$$aarticle
001048464 3367_ $$2DataCite$$aOutput Types/Journal article
001048464 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1765560250_32338
001048464 3367_ $$2BibTeX$$aARTICLE
001048464 3367_ $$2ORCID$$aJOURNAL_ARTICLE
001048464 3367_ $$00$$2EndNote$$aJournal Article
001048464 520__ $$aRecent advances in Artificial Intelligence have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal Deep Learning model for proteins that integrates structural, sequence, text, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of protein modality encoders in a lightweight fine-tuning scheme that focuses on pairwise alignment with sequence data, rather than requiring full matches. This novel approach comprises a mix of Graph Neural Networks and transformer architectures. It demonstrates good performance in retrieval tasks and showcases the efficacy of multi-modal systems in Protein Machine Learning through a broad spectrum of downstream baselines, including enzyme function prediction and binding site analysis. Furthermore, OneProt enables the transfer of representational information from specialized encoders to the sequence encoder, enhancing capabilities for distinguishing evolutionarily related and unrelated sequences and exhibiting representational properties where evolutionarily related proteins align in similar directions within the latent space. In addition, we extensively investigate modality ablations to identify the encoders that contribute the most to predictive performance, highlighting the significance of the binding site encoder, which has not been used in similar models previously. This work expands the horizons of multi-modal protein models, paving the way for transformative applications in drug discovery, biocatalytic reaction planning, and protein engineering.
001048464 536__ $$0G:(DE-HGF)POF4-2171$$a2171 - Biological and environmental resources for sustainable use (POF4-217)$$cPOF4-217$$fPOF IV$$x0
001048464 536__ $$0G:(DE-HGF)POF4-5112$$a5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x1
001048464 536__ $$0G:(DE-Juel-1)E54.303.11$$aHelmholtz AI Consultant Team FB Information (E54.303.11)$$cE54.303.11$$x2
001048464 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de
001048464 7001_ $$0P:(DE-HGF)0$$aUdayakumar, Srisruthi$$b1
001048464 7001_ $$0P:(DE-HGF)0$$aSommer, Johanna$$b2
001048464 7001_ $$0P:(DE-HGF)0$$aPiraud, Marie$$b3
001048464 7001_ $$0P:(DE-Juel1)185654$$aKesselheim, Stefan$$b4$$ufzj
001048464 7001_ $$0P:(DE-HGF)0$$aFortuin, Vincent$$b5
001048464 7001_ $$0P:(DE-HGF)0$$aGünnemann, Stephan$$b6
001048464 7001_ $$0P:(DE-Juel1)164893$$avan der Weg, Karel J.$$b7
001048464 7001_ $$0P:(DE-Juel1)172663$$aGohlke, Holger$$b8
001048464 7001_ $$0P:(DE-HGF)0$$aMerdivan, Erinc$$b9
001048464 7001_ $$0P:(DE-Juel1)192120$$aBazarova, Alina$$b10$$eCorresponding author
001048464 773__ $$0PERI:(DE-600)2193340-6$$a10.1371/journal.pcbi.1013679$$gVol. 21, no. 11, p. e1013679 -$$n11$$pe1013679$$tPLoS Computational Biology$$v21$$x1553-734X$$y2025
001048464 8564_ $$uhttps://juser.fz-juelich.de/record/1048464/files/journal.pcbi.1013679-2.pdf$$yOpenAccess
001048464 8767_ $$d2025-12-12$$eAPC$$jZahlung erfolgt
001048464 909CO $$ooai:juser.fz-juelich.de:1048464$$pdnbdelivery$$popenCost$$pVDB$$popenaire$$pdriver$$pOpenAPC$$popen_access
001048464 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)185654$$aForschungszentrum Jülich$$b4$$kFZJ
001048464 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)172663$$aForschungszentrum Jülich$$b8$$kFZJ
001048464 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)192120$$aForschungszentrum Jülich$$b10$$kFZJ
001048464 9131_ $$0G:(DE-HGF)POF4-217$$1G:(DE-HGF)POF4-210$$2G:(DE-HGF)POF4-200$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-2171$$aDE-HGF$$bForschungsbereich Erde und Umwelt$$lErde im Wandel – Unsere Zukunft nachhaltig gestalten$$vFür eine nachhaltige Bio-Ökonomie – von Ressourcen zu Produkten$$x0
001048464 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5112$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x1
001048464 9141_ $$y2025
001048464 915pc $$0PC:(DE-HGF)0000$$2APC$$aAPC keys set
001048464 915pc $$0PC:(DE-HGF)0001$$2APC$$aLocal Funding
001048464 915pc $$0PC:(DE-HGF)0002$$2APC$$aDFG OA Publikationskosten
001048464 915pc $$0PC:(DE-HGF)0003$$2APC$$aDOAJ Journal
001048464 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)0160$$2StatID$$aDBCoverage$$bEssential Science Indicators$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)1190$$2StatID$$aDBCoverage$$bBiological Abstracts$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)0600$$2StatID$$aDBCoverage$$bEbsco Academic Search$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)0100$$2StatID$$aJCR$$bPLOS COMPUT BIOL : 2022$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2024-02-08T09:42:16Z
001048464 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2024-02-08T09:42:16Z
001048464 915__ $$0StatID:(DE-HGF)0113$$2StatID$$aWoS$$bScience Citation Index Expanded$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)9900$$2StatID$$aIF < 5$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001048464 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bASC$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2024-12-16
001048464 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2024-12-16
001048464 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
001048464 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2024-12-16
001048464 920__ $$lyes
001048464 9201_ $$0I:(DE-Juel1)IBG-4-20200403$$kIBG-4$$lBioinformatik$$x0
001048464 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x1
001048464 980__ $$ajournal
001048464 980__ $$aVDB
001048464 980__ $$aUNRESTRICTED
001048464 980__ $$aI:(DE-Juel1)IBG-4-20200403
001048464 980__ $$aI:(DE-Juel1)JSC-20090406
001048464 980__ $$aAPC
001048464 9801_ $$aAPC
001048464 9801_ $$aFullTexts