Journal of Jilin University(Engineering and Technology Edition) ›› 2023, Vol. 53 ›› Issue (11): 3238-3245.doi: 10.13229/j.cnki.jdxbgxb.20220007

Previous Articles     Next Articles

A model for identifying neuropeptides by feature selection based on hybrid features

Feng-feng ZHOU1(),Zhen-wei YAN2   

  1. 1.College of Computer Science and Technology,Jilin University,Changchun 130012,China
    2.College of Software,Jilin University,Changchun 130012,China
  • Received:2022-01-04 Online:2023-11-01 Published:2023-12-06

Abstract:

This study proposed an integrated neuropeptide prediction algorithm. This study integrated nine feature descriptors and five machine learning algorithms in order to generate 45 baseline learning models for predictive training of neuropeptides. The first layer performs feature selection on these 45 baseline models to select the features with good performance. While the second layer selects eight basic learning models based on the accuracy of the baseline model and the sum of Pearson correlation coefficients. The third layer inputs the output of these learners into logical regression, Extreme Gradient Boosting (XGBoost), and other classifiers for the final step selection to train the final model, and uses the output as the final prediction result. The final accuracy on the test dataset is 0.9169, which is higher than existing models.

Key words: computer application technology, neuropeptide, feature selection, machine learning, stacking method

CLC Number: 

  • TP399

Fig.1

Flow chart of system"

Table 1

Summary of nine feature engineering algorithms"

特征策略特征构造方法 名称缩写向量 维度
基于合成的特征氨基酸组成AAC60
二肽组成DPC400
基于二元图谱的特征二元图谱特征BPNC200
氨基酸索引特征AAI36
分组氨基酸组成GAAC15
基于理化性质的特征分组二肽组成GDPC25
分组三肽组成GTPC125
成分-过渡-分布CTD147
基于位置的特征氨基酸熵AAE60

Table 2

Influence of meta model on model accuracy"

方法MCCACCAUCPRAUC
LR0.79380.89690.95430.9626
ANN0.79170.89580.95360.9619
ERT0.79580.89790.95200.9610
KNN0.77960.88960.93390.9434
XGBoost0.79390.90640.95000.9676

Table 3

Influence of feature selection method on model accuracy"

方法ACC
方差选择0.8986
相关系数0.9075
距离相关系数0.9093
卡方检验0.9083
互信息0.9087
递归特征消除0.9094
L1正则化/Lasso0.9104
L2正则化/Ridge regression0.9126
RF0.9169
Relief0.9149
GBDT0.9138
PCA0.9115
LDA0.9129

Table 4

Performance comparison with other models on the test data set"

方法ACCMCCAUC
本文0.91690.90100.9770
PredNeuroP0.89690.79400.9540
NeuroPIpred0.53600.07400,5810
1 王莹. 基于机器学习的神经肽前体及其剪切位点的预测[D]. 成都: 电子科技大学生物科学与技术学院, 2021.
Wang Ying. Prediction of neuropeptide precursor and its cleavage site based on machine learning[D]. Chengdu: School of Life Science and Technology, University of Electronic Science and Technology of China, 2021.
2 Bin Y N, Zhang W, Tang W D, et al. Prediction of neuropeptides from sequence information using ensemble classifier and hybrid features[J]. Journal of Proteome Research, 2020, 19(9): 3732-3740.
3 Hayakawa E, Watanabe H, Menschaert G, et al. A combined strategy of neuropeptide prediction and tandem mass spectrometry identifies evolutionarily conserved ancient neuropeptides in the sea anemone Nematostella vectensis[J]. PLoS ONE, 2019, 14(9): 0215185.
4 Manayalan B, Basith S, Shin T H, et al. mAHTPred: a sequence-based meta-predictor for improving the prediction of anti-hypertensive peptides using effective feature representation[J]. Bioinformatics, 2019, 35(16): 2757-2765.
5 Akbar S, Hayat M, Iqbal M, et al. iACP-GAEnsC: evolutionary genetic algorithm based ensemble classification of anticancer peptides by utilizing hybrid feature space[J]. Artificial Intelligence in Medicine, 2017, 79: 62-70.
6 Karsenty S, Rappoport N, Ofer D, et al. NeuroPID: a classifier of neuropeptide precursors[J]. Nucleic Acids Research, 2014, 42(1): 182-186.
7 Kang J J, Fang Y W, Yap P C, et al. NeuroPP: a tool for the prediction of neuropeptide precursors based on optimal sequence composition[J]. Interdisciplinary Sciences-Computational Life Sciences, 2019, 11(1): 108-114.
8 Agrawal P, Kumar S, Singh A, et al. NeuroPIpred: a tool to predict, design and scan insect neuropeptides[J]. Scientific Reports, 2019, 9(1): 20195129.
9 Cheng N, Li M L, Zhao L, et al. Comparison and integration of computational methods for deleterious synonymous mutation prediction[J]. Briefings in Bioinformatics, 2020, 21(3): 970-981.
10 Wang Y, Wang M X, Yin S W, et al. NeuroPep: a comprehensive resource of neuropeptides[J]. Database-the Journal of Biological Databases and Curation, 2015, 2015: 25931458.
11 Matallana-Surget S, Chang R L, Chan A. Protein structure, amino acid composition and sequence determine proteome vulnerability to oxidation-induced damage[J]. The EMBO Journal, 2020, 39(23): 33073387.
12 Petrilli P. Classification of protein sequences by their dipeptide composition[J]. Computer Application in the Biosciences, 1993, 9(2): 205-209.
13 Kawashima S, Pokarowski P, Pokarowska M, et al. AAindex: amino acid index database, progress report 2008[J]. Nucleic Acids Research, 2008, 36: 202-205.
14 Ken N, Yasushi K, Tatsuo O. Classification of proteins into groups based on amino acid composition and other characters. II. grouping into four types[J]. Journal of Biochemistry, 1983, 94(3): 997-1007.
15 Lee T Y, Lin Z Q, Hsieh S J, et al. Exploiting maximal dependence decomposition to identify conserved motifs from a group of aligned signal sequences[J]. Bioinformatics, 2011, 27(13): 1780-1787.
16 Yang L W, Gao H, Wu K Y, et al. Identification of cancerlectins by using cascade linear discriminant analysis and optimal g-gap tripeptide composition[J]. Current Bioinformatics, 2020, 15(6): 528-537.
17 Xia J F, Zhao X M, Huang D S. Predicting protein-protein interactions from protein sequences using meta predictor[J]. Amino Acids, 2010, 39(5): 1595-1599.
18 施启军, 潘峰, 龙福海, 等. 特征选择方法研究综述[J]. 微电子学与计算机, 2022, 39(3): 1-8.
Shi Qi-jun, Pan Feng, Long Fu-hai, et al. Summary of research on feature selection methods[J]. Microelectronics & Computer, 2022, 39(3): 1-8.
19 吴璐. 基于SVM-RFE特征选择的规则提取方法[J]. 微型电脑应用, 2021, 37(9): 150-154.
Wu Lu. Rule extraction method based on SVM-RFE feature selection[J]. Microcomputer Application, 2021, 37(9): 150-154.
20 赵若宇. Lasso及其相关优化模型在临床预测中的应用[D]. 大连: 大连理工大学数学科学学院, 2021.
Zhao Ruo-yu, Lasso and its related optimization models in clinical prediction[D]. Dalian: School of Mathematical Science, Dalian University of Technology, 2021.
21 张保华, 黄文倩, 李江波, 等. 基于I-RELIEF和SVM的畸形马铃薯在线分选[J]. 吉林大学学报: 工学版, 2014, 44(6): 1811-1817.
Zhang Bao-hua, Huang Wen-qian, Li Jiang-bo,et al. Online sorting of irregular potatoes based on I-RELIEF and SVM method[J]. Journal of Jilin University(Engineering and Technology Edition), 2014,44(6): 1811-1817.
22 杨玉玲. 特征选择与集成方法的研究及应用[D]. 兰州: 兰州大学数学与统计学院, 2021.
Yang Yu-ling. Research on feature selection and integration method and it's applications[D]. Lanzhou: School of mathematics and statistics, Lanzhou University, 2021.
23 曲铭. 基于集成学习特征选择的新闻流行度预测研究[D]. 济南: 山东大学中泰证券金融研究院, 2021.
Qu Ming. Research on news popularity prediction based on ensemble learning feature selection[D]. Jinan: Zhongtai Securities Finance Research Institute, Shandong University, 2021.
24 王斌, 何丙辉, 林娜, 等. 基于随机森林特征选择的茶园遥感提取[J]. 吉林大学学报: 工学版, 2022, 52(7): 1719-1732.
Wang Bin, He Bing-hui, Lin Na, et al. Tea plantation remote sensing extraction based on random forest[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(7): 1719-1732.
25 Tang J J, Liang J, Han C Y, et al. Crash injury severity analysis using a two-layer Stacking framework[J]. Accident Analysis and Prevention, 2019, 122: 226-238.
[1] Ya-hui ZHAO,Fei-yu LI,Rong-yi CUI,Guo-zhe JIN,Zhen-guo ZHANG,De LI,Xiao-feng JIN. Korean⁃Chinese translation quality estimation based on cross⁃lingual pretraining model [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2371-2379.
[2] Qing-tian GENG,Zhi LIU,Qing-liang LI,Fan-hua YU,Xiao-ning LI. Prediction of soil moisture based on a deep learning model [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(8): 2430-2436.
[3] Shan XUE,Ya-liang ZHANG,Qiong-ying LYU,Guo-hua CAO. Anti⁃unmanned aerial vehicle system object detection algorithm under complex background [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(3): 891-901.
[4] Heng-yan PAN,Wen-hui ZHANG,Ting-ting LIANG,Zhi-peng PENG,Wei GAO,Yong-gang WANG. Inducement analysis of taxi drivers' traffic accidents based on MIMIC and machine learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(2): 457-467.
[5] Qing-tian GENG,Yang ZHAO,Qing-liang LI,Fan-hua YU,Xiao-ning LI. Integrated LSTM and ARIMA method based on attention model for soil temperature [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(10): 2973-2981.
[6] Hui GUO,Jie-di FU,Zhen-dong LI,Yan YAN,Xiao LI. SVM parameters and feature selection optimization based on improved whale algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(10): 2952-2963.
[7] Bing ZHU,Zi-wei LI,Qi LI. Building segmentation method of remote sensing image based on improved SegNet [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(1): 248-254.
[8] Jun-jie WANG,Yuan-jun NONG,Li-te ZHANG,Pei-chen ZHAI. Visual relationship detection method based on construction scene [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(1): 226-233.
[9] Gui-he QIN,Jun-feng HUANG,Ming-hui SUN. Text input based on two⁃handed keyboard in virtual environment [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1881-1888.
[10] Tian BAI,Ming-wei XU,Si-ming LIU,Ji-an ZHANG,Zhe WANG. Dispute focus identification of pleading text based on deep neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1872-1880.
[11] Fu-heng QU,Tian-yu DING,Yang LU,Yong YANG,Ya-ting HU. Fast image codeword search algorithm based on neighborhood similarity [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1865-1871.
[12] Pei-ze LI,Shi-shun ZHAO,Xiao-hui WENG,Xin-mei JIANG,Hong-bo CUI,Jian-lei QIAO,Zhi-yong CHANG. A new method for rapid detection of pesticide residues based on multi⁃sensor optimization [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1951-1956.
[13] Feng-feng ZHOU,Hai-yang ZHU. SEE: sense EEG⁃based emotion algorithm via three⁃step feature selection strategy [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(8): 1834-1841.
[14] Bin WANG,Bing-hui HE,Na LIN,Wei WANG,Tian-yang LI. Tea plantation remote sensing extraction based on random forest feature selection [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(7): 1719-1732.
[15] Sheng-sheng WANG,Lin-yan JIANG,Yong-bo YANG. Transfer learning of medical image segmentation based on optimal transport feature selection [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(7): 1626-1638.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
[1] LI Shoutao, LI Yuanchun. Autonomous Mobile Robot Control Algorithm Based on Hierarchical Fuzzy Behaviors in Unknown Environments[J]. 吉林大学学报(工学版), 2005, 35(04): 391 -397 .
[2] Liu Qing-min,Wang Long-shan,Chen Xiang-wei,Li Guo-fa. Ball nut detection by machine vision[J]. 吉林大学学报(工学版), 2006, 36(04): 534 -538 .
[3] Li Hong-ying; Shi Wei-guang;Gan Shu-cai. Electromagnetic properties and microwave absorbing property
of Z type hexaferrite Ba3-xLaxCo2Fe24O41
[J]. 吉林大学学报(工学版), 2006, 36(06): 856 -0860 .
[4] Zhang Quan-fa,Li Ming-zhe,Sun Gang,Ge Xin . Comparison between flexible and rigid blank-holding in multi-point forming[J]. 吉林大学学报(工学版), 2007, 37(01): 25 -30 .
[5] Yang Shu-kai, Song Chuan-xue, An Xiao-juan, Cai Zhang-lin . Analyzing effects of suspension bushing elasticity
on vehicle yaw response character with virtual prototype method
[J]. 吉林大学学报(工学版), 2007, 37(05): 994 -0999 .
[6] . [J]. 吉林大学学报(工学版), 2007, 37(06): 1284 -1287 .
[7] Che Xiang-jiu,Liu Da-you,Wang Zheng-xuan . Construction of joining surface with G1 continuity for two NURBS surfaces[J]. 吉林大学学报(工学版), 2007, 37(04): 838 -841 .
[8] Liu Han-bing, Jiao Yu-ling, Liang Chun-yu,Qin Wei-jun . Effect of shape function on computing precision in meshless methods[J]. 吉林大学学报(工学版), 2007, 37(03): 715 -0720 .
[9] Li Yue-ying,Liu Yong-bing,Chen Hua . Surface hardening and tribological properties of a cam materials[J]. 吉林大学学报(工学版), 2007, 37(05): 1064 -1068 .
[10] Feng Hao,Xi Jian-feng,Jiao Cheng-wu . Placement of roadside traffic signs based on visibility distance[J]. 吉林大学学报(工学版), 2007, 37(04): 782 -785 .