基于加权朴素贝叶斯分类器和极端随机树的蛋白质接触图预测
作者:
作者单位:

作者简介:

通讯作者:

中图分类号:

TP391.4

基金项目:

国家自然科学基金(61373062,61772273)资助项目。


Improved Contact Map Prediction Using Weighted Naïve Bayes Classifier and Extremely Randomized Trees
Author:
Affiliation:

Fund Project:

  • 摘要
  • |
  • 图/表
  • |
  • 访问统计
  • |
  • 参考文献
  • |
  • 相似文献
  • |
  • 引证文献
  • |
  • 资源附件
  • |
  • 文章评论
    摘要:

    提出一个新的基于集成学习的预测器(TargetPCM),对蛋白质接触图(特别是中长程)进行高精度的预测。首先,TargetPCM使用加权朴素贝叶斯分类器(Weighted Naïve Bayes classifier,WNBC)融合3个接触图预测器的输出,其中WNBC中的权重参数通过粒子群算法优化得到;其次,将WNBC融合后的输出和基于序列的特征进行组合,得到更具鉴别能力的特征;在此基础上,应用极端随机树训练得到最终的蛋白质接触图预测模型。为了验证TargetPCM的有效性,在包含98个非冗余蛋白质的数据集上进行了测试。结果表明:对于短程、中程和长程接触,TargetPCM的Top L/5精度比现有最好的集成预测器(NeBcon)分别提高了8.2%,16.1%和5.3%。在CASP11上进一步的验证表明,对于短程、中程和长程接触,TargetPCM的Top L/5精度比现有最好的基于协同进化的集成预测器(MetaPSICOV)分别提高了7.4%,9.1%和7.5%。实验结果验证了本文所提蛋白质接触图预测方法的有效性。

    Abstract:

    The accurate prediction of residue-residue contacts provides crucial help to the ab initio protein folding and 3D structure modeling, because the accurately predicted contacts can enforce useful constraints to the structure assembly. Recent CASP experiments have witnessed the prosperities on this topic and a number of promising protein contact map predictors have emerged in the past decades. Although much progress has been made, challenges (e.g., low prediction accuracy for long-range contacts) remain. Here we developed a new meta-based predictor, called TargetPCM, which can achieve high accuracy for protein contact map prediction. TargetPCM combines the outputs of three existing powerful contact map predictors by using a weighted Naïve Bayes classifier (WNBC), among which the weight parameters are optimized with particle swarm optimization (PSO) algorithm. Then, the outputs of WNBC are further combined with the intrinsic sequence-based features and fed to the final prediction model, which is trained with extremely randomized trees (ERT), for performing contact map prediction. Tested on 98 non-redundant proteins, our TargetPCM improves the Top L/5 accuracy over the best meta-based predictor (NeBcon) by 8.2%, 16.1% and 5.3%, respectively, for short-, medium- and long-range contacts. Further investigations on CASP 11 show that TargetPCM improves the Top L/5 accuracy over the best co-evolution based meta-server predictor (MetaPSICOV) by 7.4%, 9.1% and 7.5%, respectively, for short-, medium- and long-range contacts. Detailed analysis on the experimental results shows that both the effective utilization of complementary information from base predictors and the powerful learning capability of ERT account for the performance improvements of the proposed TargetPCM over existing contact map predictors.

    参考文献
    相似文献
    引证文献
引用本文

金康荣, 於东军.基于加权朴素贝叶斯分类器和极端随机树的蛋白质接触图预测[J].南京航空航天大学学报,2018,50(5):619-628

复制
分享
文章指标
  • 点击次数:
  • 下载次数:
  • HTML阅读次数:
  • 引用次数:
历史
  • 收稿日期:2017-11-01
  • 最后修改日期:2017-12-28
  • 录用日期:
  • 在线发布日期: 2018-10-29
  • 出版日期:
您是第位访问者
网站版权 © 南京航空航天大学学报
技术支持:北京勤云科技发展有限公司