| Hauptseite > Publikationsdatenbank > NucleicBERT: Deciphering The Language of Nucleic Acids |
| Poster (After Call) | FZJ-2025-05561 |
;
2025
Abstract: In computational biology, determining the 3D structure of biomolecules has been a focal point for many decades. Experimental techniques such as NMR and X-ray crystallography for determining tertiary structures of RNA have limitations due to excessive costs and limited resolution. Although recent advancements in cryo-EM technology have made strides, these shortcomings persist. As a result, various computational techniques have been developed for RNA structure prediction. Deep learning methods have significantly improved protein structure prediction in recent years by utilizing approaches such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers. However, the direct application of these methods to RNA structure prediction faces challenges due to the limited availability of RNA structure data. While advancements in sequencing technologies have provided an abundance of RNA primary sequence data, the lack of annotated 3D structure data makes it difficult to fully leverage these sequences. To address this challenge, we propose the use of machine learning techniques that can operate with limited training data. Here, we introduce NucleicBERT, a language model based on the BERT architecture, specifically designed to predict critical RNA structural features such as contact maps, distance maps, secondary structures, and three-dimensional spatial arrangements. NucleicBERT focuses on the complex relationship between RNA sequence and structure. NucleicBERT's key innovation lies in its precision-focused methodology, which eliminates the need for extensive feature engineering and does not rely on evolutionary information. This model represents a paradigm shift, providing an accurate and versatile tool for analyzing diverse RNA sequences and enhancing computational biology methodologies.
|
The record appears in these collections: |