吉林大学学报(工学版) ›› 2017, Vol. 47 ›› Issue (2): 639-646.doi: 10.13229/j.cnki.jdxbgxb201702040

Previous Articles     Next Articles

HIV-1 protease cleavage site prediction based on feature selection and support vector machine

YUAN Zhe-ming1, 2, ZHANG Hong-yang1, 2, CHEN Yuan1   

  1. 1.Hunan Provincial Key Laboratory of Crop Germplasm Innovation and Utilization, Hunan Agricultural University, Changsha 410128, China;
    2.Hunan Provincial Key Laboratory for Biology and Control of Plant Diseases and Insect Pests, Hunan Agricultural University, Changsha 410128, China
  • Received:2015-12-15 Online:2017-03-20 Published:2017-03-20

Abstract: In order to improve the prediction accuracy of the HIV-1 protease cleavage site, a shear prediction model based on feature selection and support vector machine is proposed. First, by analysis of the cleavage site dataset of 5830 samples, and using absorption minimum redundancy maximum relevance concept, the automatic termination method is employed to select the cleavage site feature vectors. Then, the feature vector is input to a support vector machine for learning and training to build the classification model of splice sites. Finally, simulation is carried out using MATLAB 2004 simulation toolbox. Results show that the proposed model has better prediction accuracy than that of the reference models and literature report. The selected features have good interpretability and biological significance.

Key words: biophysics, cleavage site prediction, feature selection, minimal redundancy maximal relevance(mRMR), support vector machine(SVM)

CLC Number: 

  • Q6
[1] Rodríguez-Barrios F, Gago F. HIV protease inhibition: limited recent progress and advances in understanding current pitfalls[J]. Current Topics in Medicinal Chemistry, 2004, 4(9): 991-1007.
[2] Schechter I, Berger A. On the size of the active site in proteases. I. papain[J]. Biochemical and Biophysical Research Comminications, 1967, 27(2): 157-161.
[3] Nanni L, Lumini A. A new encoding technique for peptide classification[J]. Expert Systems with Applications, 2011, 38(4): 3185-3191.
[4] Kawashima S, Pokarowski P, Pokarowski M, et al. AAindex: amino acid index database, progress report 2008[J]. Nucleic Acids Research, 2008, 36(Sup.1):202-205.
[5] Cao D S, Xu Q S, Liang Y Z. Progy: a tool to generate various modes of Chou's PseAAC[J]. Bioinformatics,2013, 29(7): 960-962.
[6] 韩娜, 袁哲明, 陈渊, 等. 基于高维特征非线性筛选的HLA-A * 0201限制性CTL表位预测[J].物理化学学报, 2013, 29(9): 1945-1953.
Han Na, Yuan Zhe-ming, Chen Yuan, et al. Prediction of HLA-A * 0201 restricted Cytotoxic T Lymphocyte epitopes based on high-dimensional descriptor nonlinear screening[J]. Acta Phys-Chim Sin, 2013, 29(9): 1945-1953.
[7] 李咏, 周玮, 代志军, 等. 基于序列特征筛选与支持向量回归预测蛋白质折叠速率[J].物理化学学报, 2014, 30(6): 1091-1098.
Li Yong, Zhou Wei, Dai Zhi-jun, et al. Predicting the protein folding rate base on sequence feature screeing and support vector regression[J]. Acta Phys-Chim Sin, 2014, 30(6): 1091-1098.
[8] Li B Q, Huang T, Liu L, et al. Identification of colorectal cancer related genes with mRMR and shortest path in protein-protein interaction network[J]. PLoS one, 2012, 7(4): e33393.
[9] Li Y, Wang M, Wang H, et al. Accurate in species-specific acetylation sites by integrating protein sequence-derived and functional features[J]. Scientific Reports, 2014,4:5765.
[10] Ma X, Sun X. Sequence-based predictor of ATP-binding residues using random forest and mRMR-IFS feature selection[J]. Journal of Theoretical Biology, 2014, 360: 59-66.
[11] Chou K C. Prediction of human immunodeficiency virus protease cleavage sites in proteins[J]. Analytical Biochemistry, 1996, 233(1): 1-14.
[12] Jayavardhana R G L, Palaniswami M. Cleavage knowledge extraction in HIV-1 protease using hidden Markov model[C]∥Intelligent Sensing and Information Processing, Chennai, India, 2005: 469-473.
[13] Nanni L, Lumini A. Mpps: an ensemble of support vector machine based on multiple physicochemical properties of amino acids[J]. Neurocomputing, 2006, 69(13): 1688-1690.
[14] Niu B, Lu L, Liu L, et al. HIV-1 protease cleavage site prediction based on amino acid property[J]. Journal of Computational Chemistry, 2009, 30(1):33-39.
[15] Sarda D, Chua G H, Li K B,et al.pSLIP: SVM based protein subcellular localization prediction using multiple physicochemical properties[J].BMC Bioinformatics, 2005,6(1):1-12.
[16] Chou K C, Cai Y D. Using functional domain composition and support vector machines for prediction of protein subcellular location[J]. Journal of Biological Chemistry, 2002, 277(48): 45765-45769.
[17] Cai Y D, Liu X J, Xu X B, et al. Support vector machines for predicting protein structural class[J]. BMC Bioinformatics, 2001, 2(1):1-5.
[18] Bock J R, Gough D A. Predicting protein-protein interactions from primary structure[J]. Bioinformatics, 2001, 17(5): 455-460.
[19] Cai C, Han L Y, Ji Z L, et al. SVM-Prot: web-based support vector machine software for functional classification of a protein from its primary sequence[J]. Nucleic Acids Research, 2003, 31(13): 3692-3697.
[20] Ding C, Peng H. Minimum redundancy feature selection from microarray gene expression data[J]. Journal of Bioinformatics and Computational Biology, 2005, 3(2): 185-205.
[21] Székely G J, Rizzo M L, Bakirov N K. Measuring and testing dependence by correlation of distance[J]. The Annals of Statistics, 2007, 35(6): 2769-2794.
[22] Chang C C, Lin C J. LIBSVM: a library for support vector machines[J]. ACM Transactions on Intelligent Systems and Technology, 2007,2(3):389-396.
[23] You L, Garwica D, Rögnvaldsson T. Comprehensive bioinformatics analysis of the specificity of human immunodeficiency virus type 1 protease[J]. Journal of Virology, 2005, 79(19): 12477-12486.
[24] Kontijevskis A, Wikberg J E, Komorowski J. Computational proteomics analysis of HIV-1 protease interactome[J]. Proteins: Structure, Function, and Bioinformatics, 2007, 68(1): 305-312.
[25] Rögnvaldsson T, Etchells T A, You L, et al. How to find simple and accurate rules for viral protease cleavage speciticities[J]. BMC Bioinformatics, 2009, 10(1):1-17.
[26] Jaeger S, Chen S S. Information fusion for biological prediction[J]. Journal of Data Science, 2010, 8(2): 269-288.
[27] Impens F, Timmerman E, Staes A, et al. A catalogue of putative HIV-1 protease host cell substrates[J]. Biological Chemistry, 2012, 393(9): 915-931.
[28] Fawcett T. ROC graphs: notes and practical considerations for researchers[J]. Machine Learning, 2004, 31: 1-38.
[29] Gök M, Özcerit A T. A new feature encoding scheme for HIV-1 protease cleavage site prediction[J]. Neural Computing and Applications,2013,22(7):1757-1761.
[30] Öztürk O, Aksac A, Elsheikh A, et al. A consistency-based feature selection method allied with linear SVMs for HIV-1 protease cleavage site prediction[J]. PLoS One, 2013, 8(8): e63145.
[31] Tomii K, Kanehisa M. Analysis of amino acid indices and mutation matrices for sequence comparison and structure prediction of proteins[J]. Protein Engineering, 1996, 9(1): 27-36.
[32] Poorman B R A, Tomasselli A G, Heinrikson R L, et al. A cumulative specificity model for proteases from human immunodeficiency virus types 1 and 2, inferred from statistical analysis of an extended substrate data base[J].Journal of Biological Chemistry,2010,266(22):14554-14561.
[33] Ezziane Z. Application of artificial intelligence in bioinformatics:a review[J]. Expert Systems with Applications, 2006, 30(1): 2-10.
[1] LIU Jie, ZHANG Ping, GAO Wan-fu. Feature selection method based on conditional relevance [J]. 吉林大学学报(工学版), 2018, 48(3): 874-881.
[2] GENG Qing-tian, YU Fan-hua, WANG Yu-ting, GAO Qi-kun. New algorithm for vehicle type detection based on feature fusion [J]. 吉林大学学报(工学版), 2018, 48(3): 929-935.
[3] ZHOU Bing-hai, XU Jia-hui. SVM-based real-time scheduling approach of multi-load carries [J]. 吉林大学学报(工学版), 2016, 46(6): 2027-2033.
[4] ZHAO Yun-peng, YU Tian-lai, JIAO Yu-bo, GONG Ya-feng, SONG Gang. Damage identification method and factor evaluation for irregular-shaped bridge [J]. 吉林大学学报(工学版), 2016, 46(6): 1858-1866.
[5] MA Zhi-xing, ZHAO Qi, ZHANG Hao. Fourier analysis model for housekeeping gene [J]. 吉林大学学报(工学版), 2016, 46(5): 1639-1643.
[6] SHEN Xuan-jing, ZHAI Yu-jie, LU Yu-tong, WANG Yu, CHEN Hai-peng. Speaker recognition algorithm based on channel compensation [J]. 吉林大学学报(工学版), 2016, 46(3): 870-875.
[7] DAI Kun, YU Hong-yi, QIU Wen-bo,LI Qing. Unsupervised feature selection algorithm based on support vector machine for network data [J]. 吉林大学学报(工学版), 2015, 45(2): 576-582.
[8] ZHANG Bao-hua, HUANG Wen-qian, LI Jiang-bo, ZHAO Chun-jiang, LIU Cheng-liang, HUANG Dan-feng. Online sorting of irregular potatoes based on I-RELIEF and SVM method [J]. 吉林大学学报(工学版), 2014, 44(6): 1811-1817.
[9] WANG Gang, ZHANG Yu-xuan, LI Ying, CHEN Hui-ling, HU Wei-tong, QIN Lei. Novel method for microarray data dimension reduction [J]. 吉林大学学报(工学版), 2014, 44(5): 1429-1434.
[10] SUN Yi-xuan,SHAO Chun-fu,YUE Hao,ZHU Liang. Urban traffic accident severity analysis based on sensitivity analysis of support vector machine [J]. 吉林大学学报(工学版), 2014, 44(5): 1315-1320.
[11] YAO Deng-ju, YANG Jing, ZHAN Xiao-juan. Feature selection algorithm based on random forest [J]. 吉林大学学报(工学版), 2014, 44(01): 137-141.
[12] WANG Dan, ZHANG Xiang-he. Biomimetic recognition method of human behavior based on HOG and SVM [J]. 吉林大学学报(工学版), 2013, 43(增刊1): 489-492.
[13] WANG Xue-jun, ZHAO Lin-lin, WANG Shuang. Video object extraction method based on active learning SVM [J]. 吉林大学学报(工学版), 2013, 43(增刊1): 51-54.
[14] LIU Yuan-ning,WANG Gang,ZHU Xiao-dong,ZHAO Zheng-dong,CHEN Hui-ling,XING Chong. Feature selection based on adaptive multi-population genetic algorithm [J]. 吉林大学学报(工学版), 2011, 41(6): 1690-1693.
[15] WANG Gang,LIU Yuan-ning,ZHANG Xiao-xu,ZHAO Zheng-dong,ZHU Xiao-dong,LIU Zhen,. Novel spam filtering method based on fuzzy adaptive particle swarm optimization [J]. 吉林大学学报(工学版), 2011, 41(03): 716-720.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!