TopDomain: Exhaustive Protein Domain Boundary Metaprediction Combining Multisource Information and Deep Learning

Mulnaes, Daniel; Gohlke, Holger; Koenig, Filip; Golchin, Pegah
doi:10.1021/acs.jctc.1c00129
% IMPORTANT: The following is UTF-8 encoded.  This means that in the presence
% of non-ASCII characters, it will not work with BibTeX 0.99 or older.
% Instead, you should use an up-to-date BibTeX implementation like “bibtex8” or
% “biber”.

@ARTICLE{Mulnaes:893374,
      author       = {Mulnaes, Daniel and Golchin, Pegah and Koenig, Filip and
                      Gohlke, Holger},
      title        = {{T}op{D}omain: {E}xhaustive {P}rotein {D}omain {B}oundary
                      {M}etaprediction {C}ombining {M}ultisource {I}nformation and
                      {D}eep {L}earning},
      journal      = {Journal of chemical theory and computation},
      volume       = {17},
      number       = {7},
      issn         = {1549-9626},
      address      = {Washington, DC},
      reportid     = {FZJ-2021-02715},
      pages        = {4599–4613},
      year         = {2021},
      abstract     = {Protein domains are independent, functional, and stable
                      structural units of proteins. Accurate protein domain
                      boundary prediction plays an important role in understanding
                      protein structure and evolution, as well as for protein
                      structure prediction. Current domain boundary prediction
                      methods differ in terms of boundary definition, methodology,
                      and training databases resulting in disparate performance
                      for different proteins. We developed TopDomain, an
                      exhaustive metapredictor, that uses deep neural networks to
                      combine multisource information from sequence- and
                      homology-based features of over 50 primary predictors. For
                      this purpose, we developed a new domain boundary data set
                      termed the TopDomain data set, in which the true annotations
                      are informed by SCOPe annotations, structural domain
                      parsers, human inspection, and deep learning. We benchmark
                      TopDomain against 2484 targets with 3354 boundaries from the
                      TopDomain test set and achieve F1 scores of $78.4\%$ and
                      $73.8\%$ for multidomain boundary prediction within ±20
                      residues and ±10 residues of the true boundary,
                      respectively. When examined on targets from CASP11-13
                      competitions, TopDomain achieves F1 scores of $47.5\%$ and
                      $42.8\%$ for multidomain proteins. TopDomain significantly
                      outperforms 15 widely used, state-of-the-art ab initio and
                      homology-based domain boundary predictors. Finally, we
                      implemented TopDomainTMC, which accurately predicts whether
                      domain parsing is necessary for the target protein.},
      cin          = {JSC / NIC / IBI-7 / IBG-4},
      ddc          = {610},
      cid          = {I:(DE-Juel1)JSC-20090406 / I:(DE-Juel1)NIC-20090406 /
                      I:(DE-Juel1)IBI-7-20200312 / I:(DE-Juel1)IBG-4-20200403},
      pnm          = {5111 - Domain-Specific Simulation Data Life Cycle Labs
                      (SDLs) and Research Groups (POF4-511) / 2171 - Biological
                      and environmental resources for sustainable use (POF4-217) /
                      2172 - Utilization of renewable carbon and energy sources
                      and engineering of ecosystem functions (POF4-217) /
                      Forschergruppe Gohlke $(hkf7_20200501)$ / DFG project
                      267205415 - SFB 1208: Identität und Dynamik von
                      Membransystemen - von Molekülen bis zu zellulären
                      Funktionen},
      pid          = {G:(DE-HGF)POF4-5111 / G:(DE-HGF)POF4-2171 /
                      G:(DE-HGF)POF4-2172 / $G:(DE-Juel1)hkf7_20200501$ /
                      G:(GEPRIS)267205415},
      typ          = {PUB:(DE-HGF)16},
      pubmed       = {34161735},
      UT           = {WOS:000674289800059},
      doi          = {10.1021/acs.jctc.1c00129},
      url          = {https://juser.fz-juelich.de/record/893374},
}
guest :: login JuSER
		Search		Submit		Personalize Your alerts Your baskets Your searches		Help