Journal of Jilin University(Engineering and Technology Edition) ›› 2024, Vol. 54 ›› Issue (10): 2969-2977.doi: 10.13229/j.cnki.jdxbgxb.20221633

Previous Articles    

Generative adversarial autoencoder integrated voting algorithm based on mass spectral data

Feng-feng ZHOU1(),Tao YU1,Yu-si FAN2   

  1. 1.College of Computer Science and Technology,Jilin University,Changchun 130012,China
    2.College of Software,Jilin University,Changchun 130012,China
  • Received:2022-12-27 Online:2024-10-01 Published:2024-11-22

Abstract:

Mass spectrometry is commonly used for disease prevention and diagnosis, but the large number of mass spectrometry data features and the wide variation of features among different diseases make the task of multi-disease diagnosis complex and difficult. To solve the above problems, this paper proposes the generative adversarial autoencoder integrated voting algorithm msDAGVote based on mass spectrometry data. The msDAGVote feature extraction framework uses a dual autoencoder-based generative adversarial network, and after the network has been trained by mass spectrometry data, the generator sub-network is used for feature construction. Evaluated using mass spectrometry datasets of 10 different disease types, the experimental data show that msDAGVote extracts better features than comparative method, significantly reduces the number of features required for classification while providing excellent diagnostic power for disease classification, with classification AUC over 0.98 on six datasets and 0.87 on the remaining challenging datasets.

Key words: computer application, bioinformatics, mass spectrometry, feature engineering, feature selection, generative adversarial network, dual autoencoder

CLC Number: 

  • TP391

Fig.1

Flow chart of msDAGVote"

Fig.2

Network structure of feature construction module"

Fig.3

Voting strategy"

Table 1

Introduction of datasets"

数据集名称疾病类型样本数特征数
ST000284直肠癌1481138464
ST000355乳腺癌121112876135
ST000356乳腺癌213410131103
ST000385肺腺癌16523317590
ST000388肺癌9443 0442965
MTBLS354肺炎23912 04197142
MTBLS352糖尿病前期2102 72498112
ST0003832型糖尿病561061244
Feng冠心病1024 9484359
MTBLS408牛皮癣902 0154545

Fig.4

Result of feature pre-screen"

Table 2

Comparison of AUC for integrated voting strategy"

数据集名称GBAdaRFSHAPmsDAGVote(Var+T-test+ LinearSVC+DT)msDAGVote(RF+GB+Ada+SHAP)
ST0002840.841 60.796 40.728 50.860 70.850 70.883 5
ST0003551.000 01.000 01.000 01.000 01.000 01.000 0
ST0003561.000 01.000 01.000 01.000 01.000 01.000 0
ST0003850.850 00.735 20.777 80.842 10.818 50.870 4
ST0003880.861 50.823 10.782 10.846 20.871 80.884 6
MTBLS3540.973 60.969 10.976 40.960 10.979 20.980 1
MTBLS3520.827 30.790 90.866 70.861 40.847 70.884 1
ST0003830.888 91.000 00.814 81.000 01.000 01.000 0
Feng0.819 40.819 40.958 30.941 20.958 31.000 0
MTBLS4080.950 60.839 50.864 21.000 00.963 01.000 0

Table 3

Comparison of feature number for integrated voting strategy"

数据集名称GBAdaRFSHAPmsDAGVote(Var+T-test+LinearSVC+DT)msDAGVote(RF+GB+Ada+SHAP)
ST00028457291819530
ST000355221454
ST000356111111
ST00038568112541456
ST000388535533291239
MTBLS354242323
MTBLS352362179578066
ST00038316956710867
Feng513113
MTBLS40884345325048

Fig.5

Comparison of integrated voting strategy"

Fig.6

Comparison of AUC between msDAGVote and featureless engineering"

Table 4

Comparison of feature number between msDAGVote and featureless engineering"

数据集名称不进行特征工程msDAGVote
ST00028411330
ST0003551284
ST0003561011
ST0003852 33156
ST00038843 04439
MTBLS35412 0413
MTBLS3522 72466
ST00038310667
Feng4 9483
MTBLS4082 01548

Table 5

Comparison with other paper"

msDAGVote对比文献
数据集名称PrecisionRecallAccuracyAUCAUC
ST0002840.666 70.615 40.700 00.883 50.820
ST0003551.000 00.964 30.976 71.000 00.978
ST0003560.952 40.952 40.925 91.000 00.990
ST0003850.823 50.777 80.787 90.870 40.660
ST0003880.812 51.000 00.842 10.884 60.570
MTBLS3540.966 71.000 00.979 20.980 10.980
MTBLS3520.739 10.772 70.738 10.884 10.800
ST0003830.900 01.000 00.916 71.000 00.925
Feng0.900 00.750 00.809 51.000 00.980
MTBLS4080.900 01.000 00.944 41.000 00.945

Fig.7

Comparison with other paper"

1 陈雪云, 许韬, 黄小巧. 基于条件生成对抗网络的医学细胞图像生成检测方法[J]. 吉林大学学报: 工学版, 2021, 51(4): 1414-1419.
Chen Xue-yun, Xu Tao, Huang Xiao-qiao. Detection method of medical cell image generation based on conditional generative adversarial network[J]. Journal of Jilin University (Engineering and Technology Edition), 2021, 51(4): 1414-1419.
2 欧阳继红, 郭泽琪, 刘思光. 糖尿病视网膜病变分期双分支混合注意力决策网络 [J]. 吉林大学学报:工学版, 2022, 52(3): 648-656.
Ouyang Ji-hong, Guo Ze-qi, Liu Si-guang. Dual⁃branch hybrid attention decision net for diabetic retinopathy classification[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(3): 648-656.
3 周丰丰, 张亦弛. 基于稀疏自编码器的无监督特征工程算法BioSAE[J]. 吉林大学学报: 工学版, 2022, 52(7): 1645-1656.
Zhou Feng-feng, Zhang Yi-chi. Unsupervised feature engineering algorithm BioSAE based on sparse autoencoder[J]. Journal of Jilin University (Engineering and Technology Edition), 2022, 52(7): 1645-1656.
4 周丰丰, 张亚琪. 基于ProtBert预训练模型的HLA-Ⅰ和多肽的结合预测算法[J]. 吉林大学学报(理学版), 2023, 61(3): 651-657.
Zhou Feng-feng, Zhang Ya-qi. Binding prediction algorithm of HLA-Ⅰ and polypeptides based on pre-trained model protBert[J]. Journal of Jilin University (Science Edition), 2023, 61(3): 651-657.
5 周丰丰, 张金楷. 具有局部和全局注意力机制的图注意力网络学习单样本组学数据表征[J]. 吉林大学学报(理学版), 2023, 61(6): 1351-1357.
He Zhi-qiao, Han Yan-jun, Jia Jing-yi. Advances in ambient ionization mass spectrometry and its application in food detection[J]. Food Research and Development, 2022, 43(8): 216-224.
6 贺志乔, 韩岩君, 贾婧怡. 常压离子化质谱技术及其在食品检测中的应用研究进展[J]. 食品研究与开发, 2022, 43(8): 216-24.
He Zhi-qiao, Han Yan-jun, Jia Jing-yi. Advances in ambient ionization mass spectrometry and its application in food detection[J]. Food Research and Development, 2022, 43(8): 216-224.
7 Shen X T, Shao W, Wang C C, et al. Deep learning-based pseudo-mass spectrometry imaging analysis for precision medicine[J]. Briefings in Bioinformatics, 2022, 23(5): bbac331.
8 Cadow J, Manica M, Mathis R, et al. On the feasibility of deep learning applications using raw mass spectrometry data [J]. Bioinformatics, 2021, 37(): i245-i253.
9 Mittal P, Condina M R, Klingler-hoffmann M, et al. Cancer tissue classification using supervised machine learning applied to MALDI mass spectrometry imaging[J]. Cancers, 2021, 13(21): 5388.
10 Chen D P, Bryden W A, Wood R. Detection of tuberculosis by the analysis of exhaled breath particles with high-resolution mass spectrometry[J]. Scientific Reports, 2020, 10(1): No.7647.
11 Rumelhart D E, Hinton G E, Williams R J. Learning representations by back-propagating errors[J]. Nature, 1986, 323(6088): 533-536.
12 Goodfellow I, Pouget-abadie J, Mirza M, et al. Generative adversarial networks[J]. Communications of the ACM, 2020, 63(11): 139-144.
13 Lundberg S M, Erion G, Chen H, et al. From local explanations to global understanding with explainable AI for trees[J]. Nature Machine Intelligence, 2020, 2(1): 56-67.
14 Evans E D, Duvallet C, Chu N D, et al. Predicting human health from biofluid-based metabolomics using machine learning[J]. Sci Rep, 2020, 10(1):No. 17635.
[1] Lu Li,Jun-qi Song,Ming Zhu,He-qun Tan,Yu-fan Zhou,Chao-qi Sun,Cheng-yu Zhou. Object extraction of yellow catfish based on RGHS image enhancement and improved YOLOv5 network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(9): 2638-2645.
[2] Hong-wei ZHAO,Hong WU,Ke MA,Hai LI. Image classification framework based on knowledge distillation [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2307-2312.
[3] Xiao-yue WEN,Guo-min QIAN,Hua-hua KONG,Yue-jie MIU,Dian-hai WANG. TrafficPro: a framework to predict link speeds on signalized urban traffic network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(8): 2214-2222.
[4] Xin-gang GUO,Ying-chen HE,Chao CHENG. Noise-resistant multistep image super resolution network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 2063-2071.
[5] Yun-zuo ZHANG,Yu-xin ZHENG,Cun-yu WU,Tian ZHANG. Accurate lane detection of complex environment based on double feature extraction network [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(7): 1894-1902.
[6] Yong-li XU,Xu-lan YANG,Ji-sen ZHOU,Song-han YANG,Ming-gang SUN. Asphalt fume composition of warm mix asphalt and smoke suppression performance of warm mix agent [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1701-1707.
[7] Ming-hui SUN,Hao XUE,Yu-bo JIN,Wei-dong QU,Gui-he QIN. Video saliency prediction with collective spatio-temporal attention [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1767-1776.
[8] Yan-feng LI,Ming-yang LIU,Jia-ming HU,Hua-dong SUN,Jie-yu MENG,Ao-ying WANG,Han-yue ZHANG,Hua-min YANG,Kai-xu HAN. Infrared and visible image fusion based on gradient transfer and auto-encoder [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1777-1787.
[9] Li-ping ZHANG,Bin-yu LIU,Song LI,Zhong-xiao HAO. Trajectory k nearest neighbor query method based on sparse multi-head attention [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(6): 1756-1766.
[10] Li-ming LIANG,Long-song ZHOU,Jiang YIN,Xiao-qi SHENG. Fusion multi-scale Transformer skin lesion segmentation algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(4): 1086-1098.
[11] Yun-zuo ZHANG,Wei GUO,Wen-bo LI. Omnidirectional accurate detection algorithm for dense small objects in remote sensing images [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(4): 1105-1113.
[12] Bo-song FAN,Chun-fu SHAO. Urban rail transit emergency risk level identification method [J]. Journal of Jilin University(Engineering and Technology Edition), 2024, 54(2): 427-435.
[13] Yun-zuo ZHANG,Xu DONG,Zhao-quan CAI. Multi view gait cycle detection by fitting geometric features of lower limbs [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2611-2619.
[14] Ming-yao XIAO,Xiong-fei LI,Rui ZHU. Medical image fusion based on pixel correlation analysis in NSST domain [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2640-2648.
[15] Guang HUO,Da-wei LIN,Yuan-ning LIU,Xiao-dong ZHU,Meng YUAN,Di GAI. Lightweight iris segmentation model based on multiscale feature and attention mechanism [J]. Journal of Jilin University(Engineering and Technology Edition), 2023, 53(9): 2591-2600.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!