001     1043532
005     20250804115159.0
024 7 _ |a 10.1093/nargab/lqaf021
|2 doi
024 7 _ |a 10.34734/FZJ-2025-02908
|2 datacite_doi
024 7 _ |a 40104673
|2 pmid
024 7 _ |a WOS:001446715300001
|2 WOS
037 _ _ |a FZJ-2025-02908
082 _ _ |a 570
100 1 _ |a Upadhyay, Utkarsh
|0 P:(DE-Juel1)195915
|b 0
|u fzj
245 _ _ |a NucleoSeeker—precision filtering of RNA databases to curate high-quality datasets
260 _ _ |a Oxford
|c 2025
|b Oxford University Press
336 7 _ |a article
|2 DRIVER
336 7 _ |a Output Types/Journal article
|2 DataCite
336 7 _ |a Journal Article
|b journal
|m journal
|0 PUB:(DE-HGF)16
|s 1753357882_12062
|2 PUB:(DE-HGF)
336 7 _ |a ARTICLE
|2 BibTeX
336 7 _ |a JOURNAL_ARTICLE
|2 ORCID
336 7 _ |a Journal Article
|0 0
|2 EndNote
520 _ _ |a The structural prediction of biomolecules via computational methods complements the often involved wet-lab experiments. Unlike protein structure prediction, RNA structure prediction remains a significant challenge in bioinformatics, primarily due to the scarcity of annotated RNA structure data and its varying quality. Many methods have used this limited data to train deep learning models but redundancy, data leakage and bad data quality hampers their performance. In this work, we present NucleoSeeker, a tool designed to curate high-quality, tailored datasets from the Protein Data Bank (PDB) database. It is a unified framework that combines multiple tools and streamlines an otherwise complicated process of data curation. It offers multiple filters at structure, sequence, and annotation levels, giving researchers full control over data curation. Further, we present several use cases. In particular, we demonstrate how NucleoSeeker allows the creation of a nonredundant RNA structure dataset to assess AlphaFold3’s performance for RNA structure prediction. This demonstrates NucleoSeeker’s effectiveness in curating valuable nonredundant tailored datasets to both train novel and judge existing methods. NucleoSeeker is very easy to use, highly flexible, and can significantly increase the quality of RNA structure datasets.
536 _ _ |a 5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)
|0 G:(DE-HGF)POF4-5111
|c POF4-511
|f POF IV
|x 0
536 _ _ |a Helmholtz AI - Helmholtz Artificial Intelligence Coordination Unit – Local Unit FZJ (E.40401.62)
|0 G:(DE-Juel-1)E.40401.62
|c E.40401.62
|x 1
588 _ _ |a Dataset connected to CrossRef, Journals: juser.fz-juelich.de
700 1 _ |a Pucci, Fabrizio
|0 P:(DE-Juel1)177018
|b 1
700 1 _ |a Herold, Julian
|0 P:(DE-HGF)0
|b 2
700 1 _ |a Schug, Alexander
|0 P:(DE-Juel1)173652
|b 3
|e Corresponding author
773 _ _ |a 10.1093/nargab/lqaf021
|g Vol. 7, no. 1, p. lqaf021
|0 PERI:(DE-600)3009998-5
|n 1
|p lqaf021
|t NAR: genomics and bioinformatics
|v 7
|y 2025
|x 2631-9268
856 4 _ |u //juser.fz-juelich.de/record/1043532/files/Invoice_SOA25LT002573.pdf
856 4 _ |u https://juser.fz-juelich.de/record/1043532/files/Invoice_SOA25LT002573.pdf
856 4 _ |u https://juser.fz-juelich.de/record/1043532/files/lqaf021.pdf
|y OpenAccess
909 C O |o oai:juser.fz-juelich.de:1043532
|p openaire
|p open_access
|p OpenAPC
|p driver
|p VDB
|p openCost
|p dnbdelivery
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 0
|6 P:(DE-Juel1)195915
910 1 _ |a Forschungszentrum Jülich
|0 I:(DE-588b)5008462-8
|k FZJ
|b 3
|6 P:(DE-Juel1)173652
913 1 _ |a DE-HGF
|b Key Technologies
|l Engineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action
|1 G:(DE-HGF)POF4-510
|0 G:(DE-HGF)POF4-511
|3 G:(DE-HGF)POF4
|2 G:(DE-HGF)POF4-500
|4 G:(DE-HGF)POF
|v Enabling Computational- & Data-Intensive Science and Engineering
|9 G:(DE-HGF)POF4-5111
|x 0
914 1 _ |y 2025
915 p c |a APC keys set
|0 PC:(DE-HGF)0000
|2 APC
915 p c |a DOAJ Journal
|0 PC:(DE-HGF)0003
|2 APC
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0200
|2 StatID
|b SCOPUS
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1050
|2 StatID
|b BIOSIS Previews
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)1190
|2 StatID
|b Biological Abstracts
|d 2025-01-02
915 _ _ |a Creative Commons Attribution CC BY 4.0
|0 LIC:(DE-HGF)CCBY4
|2 HGFVOC
915 _ _ |a WoS
|0 StatID:(DE-HGF)0112
|2 StatID
|b Emerging Sources Citation Index
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0501
|2 StatID
|b DOAJ Seal
|d 2024-04-03T10:37:58Z
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0500
|2 StatID
|b DOAJ
|d 2024-04-03T10:37:58Z
915 _ _ |a Fees
|0 StatID:(DE-HGF)0700
|2 StatID
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0150
|2 StatID
|b Web of Science Core Collection
|d 2025-01-02
915 _ _ |a OpenAccess
|0 StatID:(DE-HGF)0510
|2 StatID
915 _ _ |a Peer Review
|0 StatID:(DE-HGF)0030
|2 StatID
|b DOAJ : Anonymous peer review
|d 2024-04-03T10:37:58Z
915 _ _ |a Article Processing Charges
|0 StatID:(DE-HGF)0561
|2 StatID
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0300
|2 StatID
|b Medline
|d 2025-01-02
915 _ _ |a DBCoverage
|0 StatID:(DE-HGF)0199
|2 StatID
|b Clarivate Analytics Master Journal List
|d 2025-01-02
920 1 _ |0 I:(DE-Juel1)JSC-20090406
|k JSC
|l Jülich Supercomputing Center
|x 0
980 _ _ |a journal
980 _ _ |a VDB
980 _ _ |a UNRESTRICTED
980 _ _ |a I:(DE-Juel1)JSC-20090406
980 _ _ |a APC
980 1 _ |a APC
980 1 _ |a FullTexts


LibraryCollectionCLSMajorCLSMinorLanguageAuthor
Marc 21