001043532 001__ 1043532
001043532 005__ 20250804115159.0
001043532 0247_ $$2doi$$a10.1093/nargab/lqaf021
001043532 0247_ $$2datacite_doi$$a10.34734/FZJ-2025-02908
001043532 0247_ $$2pmid$$a40104673
001043532 0247_ $$2WOS$$aWOS:001446715300001
001043532 037__ $$aFZJ-2025-02908
001043532 082__ $$a570
001043532 1001_ $$0P:(DE-Juel1)195915$$aUpadhyay, Utkarsh$$b0$$ufzj
001043532 245__ $$aNucleoSeeker—precision filtering of RNA databases to curate high-quality datasets
001043532 260__ $$aOxford$$bOxford University Press$$c2025
001043532 3367_ $$2DRIVER$$aarticle
001043532 3367_ $$2DataCite$$aOutput Types/Journal article
001043532 3367_ $$0PUB:(DE-HGF)16$$2PUB:(DE-HGF)$$aJournal Article$$bjournal$$mjournal$$s1753357882_12062
001043532 3367_ $$2BibTeX$$aARTICLE
001043532 3367_ $$2ORCID$$aJOURNAL_ARTICLE
001043532 3367_ $$00$$2EndNote$$aJournal Article
001043532 520__ $$aThe structural prediction of biomolecules via computational methods complements the often involved wet-lab experiments. Unlike protein structure prediction, RNA structure prediction remains a significant challenge in bioinformatics, primarily due to the scarcity of annotated RNA structure data and its varying quality. Many methods have used this limited data to train deep learning models but redundancy, data leakage and bad data quality hampers their performance. In this work, we present NucleoSeeker, a tool designed to curate high-quality, tailored datasets from the Protein Data Bank (PDB) database. It is a unified framework that combines multiple tools and streamlines an otherwise complicated process of data curation. It offers multiple filters at structure, sequence, and annotation levels, giving researchers full control over data curation. Further, we present several use cases. In particular, we demonstrate how NucleoSeeker allows the creation of a nonredundant RNA structure dataset to assess AlphaFold3’s performance for RNA structure prediction. This demonstrates NucleoSeeker’s effectiveness in curating valuable nonredundant tailored datasets to both train novel and judge existing methods. NucleoSeeker is very easy to use, highly flexible, and can significantly increase the quality of RNA structure datasets.
001043532 536__ $$0G:(DE-HGF)POF4-5111$$a5111 - Domain-Specific Simulation & Data Life Cycle Labs (SDLs) and Research Groups (POF4-511)$$cPOF4-511$$fPOF IV$$x0
001043532 536__ $$0G:(DE-Juel-1)E.40401.62$$aHelmholtz AI - Helmholtz Artificial Intelligence  Coordination Unit – Local Unit FZJ (E.40401.62)$$cE.40401.62$$x1
001043532 588__ $$aDataset connected to CrossRef, Journals: juser.fz-juelich.de
001043532 7001_ $$0P:(DE-Juel1)177018$$aPucci, Fabrizio$$b1
001043532 7001_ $$0P:(DE-HGF)0$$aHerold, Julian$$b2
001043532 7001_ $$0P:(DE-Juel1)173652$$aSchug, Alexander$$b3$$eCorresponding author
001043532 773__ $$0PERI:(DE-600)3009998-5$$a10.1093/nargab/lqaf021$$gVol. 7, no. 1, p. lqaf021$$n1$$plqaf021$$tNAR: genomics and bioinformatics$$v7$$x2631-9268$$y2025
001043532 8564_ $$u//juser.fz-juelich.de/record/1043532/files/Invoice_SOA25LT002573.pdf
001043532 8564_ $$uhttps://juser.fz-juelich.de/record/1043532/files/Invoice_SOA25LT002573.pdf
001043532 8564_ $$uhttps://juser.fz-juelich.de/record/1043532/files/lqaf021.pdf$$yOpenAccess
001043532 8767_ $$8SOA25LT002573$$92025-02-28$$a1200215375$$d2025-07-09$$eAPC$$jZahlung erfolgt
001043532 909CO $$ooai:juser.fz-juelich.de:1043532$$pdnbdelivery$$popenCost$$pVDB$$pdriver$$pOpenAPC$$popen_access$$popenaire
001043532 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)195915$$aForschungszentrum Jülich$$b0$$kFZJ
001043532 9101_ $$0I:(DE-588b)5008462-8$$6P:(DE-Juel1)173652$$aForschungszentrum Jülich$$b3$$kFZJ
001043532 9131_ $$0G:(DE-HGF)POF4-511$$1G:(DE-HGF)POF4-510$$2G:(DE-HGF)POF4-500$$3G:(DE-HGF)POF4$$4G:(DE-HGF)POF$$9G:(DE-HGF)POF4-5111$$aDE-HGF$$bKey Technologies$$lEngineering Digital Futures – Supercomputing, Data Management and Information Security for Knowledge and Action$$vEnabling Computational- & Data-Intensive Science and Engineering$$x0
001043532 9141_ $$y2025
001043532 915pc $$0PC:(DE-HGF)0000$$2APC$$aAPC keys set
001043532 915pc $$0PC:(DE-HGF)0003$$2APC$$aDOAJ Journal
001043532 915__ $$0StatID:(DE-HGF)0200$$2StatID$$aDBCoverage$$bSCOPUS$$d2025-01-02
001043532 915__ $$0StatID:(DE-HGF)1050$$2StatID$$aDBCoverage$$bBIOSIS Previews$$d2025-01-02
001043532 915__ $$0StatID:(DE-HGF)1190$$2StatID$$aDBCoverage$$bBiological Abstracts$$d2025-01-02
001043532 915__ $$0LIC:(DE-HGF)CCBY4$$2HGFVOC$$aCreative Commons Attribution CC BY 4.0
001043532 915__ $$0StatID:(DE-HGF)0112$$2StatID$$aWoS$$bEmerging Sources Citation Index$$d2025-01-02
001043532 915__ $$0StatID:(DE-HGF)0501$$2StatID$$aDBCoverage$$bDOAJ Seal$$d2024-04-03T10:37:58Z
001043532 915__ $$0StatID:(DE-HGF)0500$$2StatID$$aDBCoverage$$bDOAJ$$d2024-04-03T10:37:58Z
001043532 915__ $$0StatID:(DE-HGF)0700$$2StatID$$aFees$$d2025-01-02
001043532 915__ $$0StatID:(DE-HGF)0150$$2StatID$$aDBCoverage$$bWeb of Science Core Collection$$d2025-01-02
001043532 915__ $$0StatID:(DE-HGF)0510$$2StatID$$aOpenAccess
001043532 915__ $$0StatID:(DE-HGF)0030$$2StatID$$aPeer Review$$bDOAJ : Anonymous peer review$$d2024-04-03T10:37:58Z
001043532 915__ $$0StatID:(DE-HGF)0561$$2StatID$$aArticle Processing Charges$$d2025-01-02
001043532 915__ $$0StatID:(DE-HGF)0300$$2StatID$$aDBCoverage$$bMedline$$d2025-01-02
001043532 915__ $$0StatID:(DE-HGF)0199$$2StatID$$aDBCoverage$$bClarivate Analytics Master Journal List$$d2025-01-02
001043532 9201_ $$0I:(DE-Juel1)JSC-20090406$$kJSC$$lJülich Supercomputing Center$$x0
001043532 980__ $$ajournal
001043532 980__ $$aVDB
001043532 980__ $$aUNRESTRICTED
001043532 980__ $$aI:(DE-Juel1)JSC-20090406
001043532 980__ $$aAPC
001043532 9801_ $$aAPC
001043532 9801_ $$aFullTexts