| Home > Publications database > Multi-modal integration for biological tasks: perks, caveats and applications |
| Talk (non-conference) (Invited) | FZJ-2026-00934 |
2026
Abstract: In this talk, I will present OneProt, a versatile artificial intelligence framework for protein analysis that leverages multi-modal integration across structural, sequence, textual, and binding-site data. To align these heterogeneous modalities, OneProt adopts an ImageBind-inspired training strategy, enabling efficient cross-modal representation learning without requiring fully paired data. By combining graph neural networks and transformer-based architectures, OneProt achieves strong performance across tasks such as enzyme function prediction and binding-site analysis. I will highlight two key features of the framework: its ability to seamlessly incorporate custom modalities during pre-training, and a lightweight fine-tuning strategy that relies only on a simple multi-layer perceptron projection. Through empirical results, I will demonstrate how multi-modal integration can reduce the reliance on large task-specific datasets while maintaining competitive downstream performance. Alongside these benefits, I will discuss the practical challenges and caveats of adding new modalities, including alignment noise, modality imbalance, and training stability. Finally, I will present preliminary results from a follow-up project, OneProtGPT, which integrates OneProt with scientific large language models to enable cross-modal retrieval and the integration of protein representations with natural language.
|
The record appears in these collections: |