Journal Article FZJ-2021-02715

http://join2-wiki.gsi.de/foswiki/pub/Main/Artwork/join2_logo100x88.png
TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning

 ;  ;  ;

2021
Washington, DC

Journal of chemical theory and computation 17(7), 4599–4613 () [10.1021/acs.jctc.1c00129]

This record in other databases:      

Please use a persistent id in citations:   doi:

Abstract: Protein domains are independent, functional, and stable structural units of proteins. Accurate protein domain boundary prediction plays an important role in understanding protein structure and evolution, as well as for protein structure prediction. Current domain boundary prediction methods differ in terms of boundary definition, methodology, and training databases resulting in disparate performance for different proteins. We developed TopDomain, an exhaustive metapredictor, that uses deep neural networks to combine multisource information from sequence- and homology-based features of over 50 primary predictors. For this purpose, we developed a new domain boundary data set termed the TopDomain data set, in which the true annotations are informed by SCOPe annotations, structural domain parsers, human inspection, and deep learning. We benchmark TopDomain against 2484 targets with 3354 boundaries from the TopDomain test set and achieve F1 scores of 78.4% and 73.8% for multidomain boundary prediction within ±20 residues and ±10 residues of the true boundary, respectively. When examined on targets from CASP11-13 competitions, TopDomain achieves F1 scores of 47.5% and 42.8% for multidomain proteins. TopDomain significantly outperforms 15 widely used, state-of-the-art ab initio and homology-based domain boundary predictors. Finally, we implemented TopDomainTMC, which accurately predicts whether domain parsing is necessary for the target protein.

Classification:

Contributing Institute(s):
  1. Jülich Supercomputing Center (JSC)
  2. John von Neumann - Institut für Computing (NIC)
  3. Strukturbiochemie (IBI-7)
  4. Bioinformatik (IBG-4)
Research Program(s):
  1. 5111 - Domain-Specific Simulation Data Life Cycle Labs (SDLs) and Research Groups (POF4-511) (POF4-511)
  2. 2171 - Biological and environmental resources for sustainable use (POF4-217) (POF4-217)
  3. 2172 - Utilization of renewable carbon and energy sources and engineering of ecosystem functions (POF4-217) (POF4-217)
  4. Forschergruppe Gohlke (hkf7_20200501) (hkf7_20200501)
  5. DFG project 267205415 - SFB 1208: Identität und Dynamik von Membransystemen - von Molekülen bis zu zellulären Funktionen (267205415)

Appears in the scientific report 2021
Database coverage:
Medline ; Embargoed OpenAccess ; Clarivate Analytics Master Journal List ; Current Contents - Physical, Chemical and Earth Sciences ; Essential Science Indicators ; IF >= 5 ; JCR ; SCOPUS ; Science Citation Index Expanded ; Web of Science Core Collection
Click to display QR Code for this record

The record appears in these collections:
Document types > Articles > Journal Article
Institute Collections > IBI > IBI-7
Institute Collections > IBG > IBG-4
Workflow collections > Public records
Institute Collections > JSC
Publications database
Open Access
NIC

 Record created 2021-06-25, last modified 2023-08-15


Published on 2021-06-23. Available in OpenAccess from 2022-06-23.:
Download fulltext PDF
(additional files)
External link:
Download fulltextFulltext by OpenAccess repository
Rate this document:

Rate this document:
1
2
3
 
(Not yet reviewed)