Poster (After Call) FZJ-2025-05561

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
NucleicBERT: Deciphering The Language of Nucleic Acids

 ;

2025

Biophysical Society Meeting 2025, BPS2025, HHU DüsseldorfLos Angeles, HHU Düsseldorf, USA, 15 Feb 2025 - 19 Feb 20252025-02-152025-02-19

Abstract: In computational biology, determining the 3D structure of biomolecules has been a focal point for many decades. Experimental techniques such as NMR and X-ray crystallography for determining tertiary structures of RNA have limitations due to excessive costs and limited resolution. Although recent advancements in cryo-EM technology have made strides, these shortcomings persist. As a result, various computational techniques have been developed for RNA structure prediction. Deep learning methods have significantly improved protein structure prediction in recent years by utilizing approaches such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and transformers. However, the direct application of these methods to RNA structure prediction faces challenges due to the limited availability of RNA structure data. While advancements in sequencing technologies have provided an abundance of RNA primary sequence data, the lack of annotated 3D structure data makes it difficult to fully leverage these sequences. To address this challenge, we propose the use of machine learning techniques that can operate with limited training data. Here, we introduce NucleicBERT, a language model based on the BERT architecture, specifically designed to predict critical RNA structural features such as contact maps, distance maps, secondary structures, and three-dimensional spatial arrangements. NucleicBERT focuses on the complex relationship between RNA sequence and structure. NucleicBERT's key innovation lies in its precision-focused methodology, which eliminates the need for extensive feature engineering and does not rely on evolutionary information. This model represents a paradigm shift, providing an accurate and versatile tool for analyzing diverse RNA sequences and enhancing computational biology methodologies.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) (POF4-511)

Appears in the scientific report 2025
Click to display QR Code for this record

The record appears in these collections:
Dokumenttypen > Präsentationen > Poster
Workflowsammlungen > Öffentliche Einträge
Institutssammlungen > JSC
Publikationsdatenbank

 Datensatz erzeugt am 2025-12-17, letzte Änderung am 2026-01-06



Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)