Conference Presentation (After Call) FZJ-2025-05562

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
NucleoSeeker - Precision filtering of RNA databases to curate high-quality datasets

 ;  ;  ;

2025

DPG-Frühjahrstagungen 2025, DPG 2025, HHU DüsseldorfRegensburg, HHU Düsseldorf, Germany, 16 Mar 2025 - 21 Mar 20252025-03-162025-03-21

Abstract: The structural prediction of biomolecules via computational methods complements the often involved wet-lab experiments. Unlike protein structure prediction, RNA structure prediction remains a significant challenge in bioinformatics, primarily due to the scarcity of RNA structure data and its varying quality. Many methods have used this limited data to train deep learning models but redundancy, data leakage and bad data quality hampers their performance. In this work, we present NucleoSeeker, a tool designed to curate high-quality, tailored datasets from the Protein Data Bank (PDB) database. It is a unified framework that combines multiple tools and streamlines an otherwise complicated process of data curation. It offers multiple filters at structure, sequence and annotation levels, giving researchers full control over data curation. Further, we present several use cases. In particular, we demonstrate how NucleoSeeker allows the creation of a non-redundant RNA structure dataset to assess AlphaFold3's performance for RNA structure prediction. This demonstrates NucleoSeeker's effectiveness in curating valuable non-redundant tailored datasets to both train novel and judge existing methods. NucleoSeeker is very easy to use, highly flexible and can significantly increase the quality of RNA structure datasets.


Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
Research Program(s):
  1. 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) (POF4-511)

Appears in the scientific report 2025
Click to display QR Code for this record

The record appears in these collections:
Dokumenttypen > Präsentationen > Konferenzvorträge
Workflowsammlungen > Öffentliche Einträge
Institutssammlungen > JSC
Publikationsdatenbank

 Datensatz erzeugt am 2025-12-17, letzte Änderung am 2026-01-06



Dieses Dokument bewerten:

Rate this document:
1
2
3
 
(Bisher nicht rezensiert)