TypAmountVATCurrencyShareStatusCost centre
APC3043.000.00EUR100.00 %(Zahlung erfolgt)ZB
Sum3043.000.00EUR   
Total3043.00     
Journal Article FZJ-2025-04662

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
OneProt: Towards multi-modal protein foundation models via latent space alignment of sequence, structure, binding sites and text encoders

 ;  ;  ;  ;  ;  ;  ;  ;  ;  ;

2025
Public Library of Science San Francisco, Calif.

PLoS Computational Biology 21(11), e1013679 () [10.1371/journal.pcbi.1013679]

This record in other databases:  

Please use a persistent id in citations: doi:  doi:

Abstract: Recent advances in Artificial Intelligence have enabled multi-modal systems to model and translate diverse information spaces. Extending beyond text and vision, we introduce OneProt, a multi-modal Deep Learning model for proteins that integrates structural, sequence, text, and binding site data. Using the ImageBind framework, OneProt aligns the latent spaces of protein modality encoders in a lightweight fine-tuning scheme that focuses on pairwise alignment with sequence data, rather than requiring full matches. This novel approach comprises a mix of Graph Neural Networks and transformer architectures. It demonstrates good performance in retrieval tasks and showcases the efficacy of multi-modal systems in Protein Machine Learning through a broad spectrum of downstream baselines, including enzyme function prediction and binding site analysis. Furthermore, OneProt enables the transfer of representational information from specialized encoders to the sequence encoder, enhancing capabilities for distinguishing evolutionarily related and unrelated sequences and exhibiting representational properties where evolutionarily related proteins align in similar directions within the latent space. In addition, we extensively investigate modality ablations to identify the encoders that contribute the most to predictive performance, highlighting the significance of the binding site encoder, which has not been used in similar models previously. This work expands the horizons of multi-modal protein models, paving the way for transformative applications in drug discovery, biocatalytic reaction planning, and protein engineering.

Classification:

Contributing Institute(s):
  1. Bioinformatik (IBG-4)
  2. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 2171 - Biological and environmental resources for sustainable use (POF4-217) (POF4-217)
  2. 5112 - Cross-Domain Algorithms, Tools, Methods Labs (ATMLs) and Research Groups (POF4-511) (POF4-511)
  3. Helmholtz AI Consultant Team FB Information (E54.303.11) (E54.303.11)

Appears in the scientific report 2025
Database coverage:
Medline ; Creative Commons Attribution CC BY 4.0 ; DOAJ ; OpenAccess ; Article Processing Charges ; BIOSIS Previews ; Biological Abstracts ; Clarivate Analytics Master Journal List ; DOAJ Seal ; Ebsco Academic Search ; Essential Science Indicators ; Fees ; IF < 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Institute Collections > IBG > IBG-4
Workflow collections > Public records
Workflow collections > Publication Charges
Institute Collections > JSC
Publications database
Open Access

 Record created 2025-11-25, last modified 2025-12-12


OpenAccess:
Download fulltext PDF
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)