吉林大学学报(工学版) ›› 2017, Vol. 47 ›› Issue (2): 639-646.doi: 10.13229/j.cnki.jdxbgxb201702040
袁哲明1, 2, 张弘杨1, 2, 陈渊1
YUAN Zhe-ming1, 2, ZHANG Hong-yang1, 2, CHEN Yuan1
摘要: 为了提高HIV-1型蛋白酶剪切位点的预测准确性,提出一种基于特征选择和支持向量机的剪切位点预测模型。首先,通过对5830个样本的HIV-1型蛋白酶剪切位点数据集进行分析,根据最小冗余最大相关理念,采用可自动终止法选择剪切位点的特征向量;然后,将特征向量输入到支持向量机进行学习和训练,建立HIV-1型蛋白酶剪切位点的分类模型;最后,采用Matlab 2014的仿真工具箱进行仿真测试。实验结果表明:本文模型在特征最少的条件下,剪切位点预测精度优于参比模型及文献报道,且所选择的特征向量具有较好的可解释性及生物学意义。
中图分类号:
[1] Rodríguez-Barrios F, Gago F. HIV protease inhibition: limited recent progress and advances in understanding current pitfalls[J]. Current Topics in Medicinal Chemistry, 2004, 4(9): 991-1007. [2] Schechter I, Berger A. On the size of the active site in proteases. I. papain[J]. Biochemical and Biophysical Research Comminications, 1967, 27(2): 157-161. [3] Nanni L, Lumini A. A new encoding technique for peptide classification[J]. Expert Systems with Applications, 2011, 38(4): 3185-3191. [4] Kawashima S, Pokarowski P, Pokarowski M, et al. AAindex: amino acid index database, progress report 2008[J]. Nucleic Acids Research, 2008, 36(Sup.1):202-205. [5] Cao D S, Xu Q S, Liang Y Z. Progy: a tool to generate various modes of Chou's PseAAC[J]. Bioinformatics,2013, 29(7): 960-962. [6] 韩娜, 袁哲明, 陈渊, 等. 基于高维特征非线性筛选的HLA-A * 0201限制性CTL表位预测[J].物理化学学报, 2013, 29(9): 1945-1953. Han Na, Yuan Zhe-ming, Chen Yuan, et al. Prediction of HLA-A * 0201 restricted Cytotoxic T Lymphocyte epitopes based on high-dimensional descriptor nonlinear screening[J]. Acta Phys-Chim Sin, 2013, 29(9): 1945-1953. [7] 李咏, 周玮, 代志军, 等. 基于序列特征筛选与支持向量回归预测蛋白质折叠速率[J].物理化学学报, 2014, 30(6): 1091-1098. Li Yong, Zhou Wei, Dai Zhi-jun, et al. Predicting the protein folding rate base on sequence feature screeing and support vector regression[J]. Acta Phys-Chim Sin, 2014, 30(6): 1091-1098. [8] Li B Q, Huang T, Liu L, et al. Identification of colorectal cancer related genes with mRMR and shortest path in protein-protein interaction network[J]. PLoS one, 2012, 7(4): e33393. [9] Li Y, Wang M, Wang H, et al. Accurate in species-specific acetylation sites by integrating protein sequence-derived and functional features[J]. Scientific Reports, 2014,4:5765. [10] Ma X, Sun X. Sequence-based predictor of ATP-binding residues using random forest and mRMR-IFS feature selection[J]. Journal of Theoretical Biology, 2014, 360: 59-66. [11] Chou K C. Prediction of human immunodeficiency virus protease cleavage sites in proteins[J]. Analytical Biochemistry, 1996, 233(1): 1-14. [12] Jayavardhana R G L, Palaniswami M. Cleavage knowledge extraction in HIV-1 protease using hidden Markov model[C]∥Intelligent Sensing and Information Processing, Chennai, India, 2005: 469-473. [13] Nanni L, Lumini A. Mpps: an ensemble of support vector machine based on multiple physicochemical properties of amino acids[J]. Neurocomputing, 2006, 69(13): 1688-1690. [14] Niu B, Lu L, Liu L, et al. HIV-1 protease cleavage site prediction based on amino acid property[J]. Journal of Computational Chemistry, 2009, 30(1):33-39. [15] Sarda D, Chua G H, Li K B,et al.pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties[J].BMC Bioinformatics, 2005,6(1):1-12. [16] Chou K C, Cai Y D. Using functional domain composition and support vector machines for prediction of protein subcellular location[J]. Journal of Biological Chemistry, 2002, 277(48): 45765-45769. [17] Cai Y D, Liu X J, Xu X B, et al. Support vector machines for predicting protein structural class[J]. BMC Bioinformatics, 2001, 2(1):1-5. [18] Bock J R, Gough D A. Predicting protein-protein interactions from primary structure[J]. Bioinformatics, 2001, 17(5): 455-460. [19] Cai C, Han L Y, Ji Z L, et al. SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence[J]. Nucleic Acids Research, 2003, 31(13): 3692-3697. [20] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data[J]. Journal of Bioinformatics and Computational Biology, 2005, 3(2): 185-205. [21] Székely G J, Rizzo M L, Bakirov N K. Measuring and testing dependence by correlation of distance[J]. The Annals of Statistics, 2007, 35(6): 2769-2794. [22] Chang C C, Lin C J. LIBSVM: a library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2007,2(3):389-396. [23] You L, Garwica D, Rögnvaldsson T. Comprehensive bioinformatics analysis of the specificity of human immunodeficiency virus type 1 protease[J]. Journal of Virology, 2005, 79(19): 12477-12486. [24] Kontijevskis A, Wikberg J E, Komorowski J. Computational proteomics analysis of HIV-1 protease interactome[J]. Proteins: Structure, Function, and Bioinformatics, 2007, 68(1): 305-312. [25] Rögnvaldsson T, Etchells T A, You L, et al. How to find simple and accurate rules for viral protease cleavage speciticities[J]. BMC Bioinformatics, 2009, 10(1):1-17. [26] Jaeger S, Chen S S. Information fusion for biological prediction[J]. Journal of Data Science, 2010, 8(2): 269-288. [27] Impens F, Timmerman E, Staes A, et al. A catalogue of putative HIV-1 protease host cell substrates[J]. Biological Chemistry, 2012, 393(9): 915-931. [28] Fawcett T. ROC graphs: notes and practical considerations for researchers[J]. Machine Learning, 2004, 31: 1-38. [29] Gök M, Özcerit A T. A new feature encoding scheme for HIV-1 protease cleavage site prediction[J]. Neural Computing and Applications,2013,22(7):1757-1761. [30] Öztürk O, Aksac A, Elsheikh A, et al. A consistency-based feature selection method allied with linear SVMs for HIV-1 protease cleavage site prediction[J]. PLoS One, 2013, 8(8): e63145. [31] Tomii K, Kanehisa M. Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins[J]. Protein Engineering, 1996, 9(1): 27-36. [32] Poorman B R A, Tomasselli A G, Heinrikson R L, et al. A cumulative specificity model for proteases from human immunodeficiency virus types 1 and 2, inferred from statistical analysis of an extended substrate data base[J].Journal of Biological Chemistry,2010,266(22):14554-14561. [33] Ezziane Z. Application of artificial intelligence in bioinformatics:a review[J]. Expert Systems with Applications, 2006, 30(1): 2-10. |
[1] | 隗海林, 包翠竹, 李洪雪, 李明达. 基于最小二乘支持向量机的怠速时间预测[J]. 吉林大学学报(工学版), 2018, 48(5): 1360-1365. |
[2] | 刘杰, 张平, 高万夫. 基于条件相关的特征选择方法[J]. 吉林大学学报(工学版), 2018, 48(3): 874-881. |
[3] | 耿庆田, 于繁华, 王宇婷, 高琦坤. 基于特征融合的车型检测新算法[J]. 吉林大学学报(工学版), 2018, 48(3): 929-935. |
[4] | 蔡振闹, 吕信恩, 陈慧灵. 基于反向细菌优化支持向量机的躯体化障碍预测模型[J]. 吉林大学学报(工学版), 2018, 48(3): 936-942. |
[5] | 梁士利, 魏莹, 潘迪, 张玲, 许廷发, 王双维. 基于语谱图行投影的特定人二字汉语词汇识别[J]. 吉林大学学报(工学版), 2017, 47(1): 294-300. |
[6] | 商强, 杨兆升, 张伟, 邴其春, 周熙阳. 基于奇异谱分析和CKF-LSSVM的短时交通流量预测[J]. 吉林大学学报(工学版), 2016, 46(6): 1792-1798. |
[7] | 赵云鹏, 于天来, 焦峪波, 宫亚峰, 宋刚. 异形桥梁损伤识别方法及参数影响分析[J]. 吉林大学学报(工学版), 2016, 46(6): 1858-1866. |
[8] | 周炳海, 徐佳惠. 基于支持向量机的多载量小车实时调度[J]. 吉林大学学报(工学版), 2016, 46(6): 2027-2033. |
[9] | 卢英, 王慧琴, 秦立科. 高大空间建筑火灾精确定位方法[J]. 吉林大学学报(工学版), 2016, 46(6): 2067-2073. |
[10] | 马知行, 赵琦, 张浩. 基于傅立叶分析的持家基因预测模型[J]. 吉林大学学报(工学版), 2016, 46(5): 1639-1643. |
[11] | 王品, 何璇, 吕洋, 李勇明, 邱明国, 刘书君. 基于多特征支持向量机和弹性区域生长的膝软骨自动分割[J]. 吉林大学学报(工学版), 2016, 46(5): 1688-1696. |
[12] | 张静, 刘向东. 混沌粒子群算法优化最小二乘支持向量机的混凝土强度预测[J]. 吉林大学学报(工学版), 2016, 46(4): 1097-1102. |
[13] | 申铉京, 翟玉杰, 卢禹彤, 王玉, 陈海鹏. 基于信道补偿的说话人识别算法[J]. 吉林大学学报(工学版), 2016, 46(3): 870-875. |
[14] | 宗芳, 王占中, 贾洪飞, 焦玉玲, 吴杨. 基于支持向量机的通勤日活动-出行持续时间预测[J]. 吉林大学学报(工学版), 2016, 46(2): 406-411. |
[15] | 张浩, 刘海明, 吴春国, 张艳梅, 赵天明, 李寿涛. 基于多特征融合的绿色通道车辆检测判定[J]. 吉林大学学报(工学版), 2016, 46(1): 271-276. |
|