Journal of Jilin University(Medicine Edition) ›› 2022, Vol. 48 ›› Issue (2): 426-435.doi: 10.13481/j.1671-587X.20220220
• Research in clinical medicine • Previous Articles Next Articles
Chengsheng LI1,Qihan BAO1,Xiaoyan HAO2,Qingzhong PAN3,Suzhen WANG1(),Fuyan SHI1
Received:
2021-07-21
Online:
2022-03-28
Published:
2022-05-10
Contact:
Suzhen WANG
E-mail:wangsz@wfmc.edu.cn
CLC Number:
Chengsheng LI,Qihan BAO,Xiaoyan HAO,Qingzhong PAN,Suzhen WANG,Fuyan SHI. Establishment of prediction model for postoperative pancreatic cancer based on random forest algorithm[J].Journal of Jilin University(Medicine Edition), 2022, 48(2): 426-435.
Tab. 1
Prognostic variable information for pancreatic cancer patients"
Variable | Assignment | Records in SEER database | Number | Percentage(η/%) |
---|---|---|---|---|
Age(year) | 1 | ≤49 | 253 | 6.3 |
2 | 50-59 | 789 | 19.6 | |
3 | 60-69 | 1 414 | 35.2 | |
4 | 70-79 | 1 180 | 29.4 | |
5 | ≥80 | 384 | 9.6 | |
Gender | 1 | Female | 1 962 | 48.8 |
2 | Male | 2 058 | 51.2 | |
Race | 1 | Black | 413 | 10.3 |
2 | White | 3 263 | 81.2 | |
3 | Other | 344 | 8.6 | |
PrimarySite | 1 | Pancreatic head | 3 063 | 76.2 |
2 | Pancreatic body | 274 | 6.8 | |
3 | Pancreatic tail | 355 | 8.8 | |
4 | Pancreatic overlap | 179 | 4.5 | |
5 | Other | 149 | 3.7 | |
Grade | 1 | GradeⅠ | 344 | 8.6 |
2 | GradeⅡ | 2 022 | 50.3 | |
3 | GradeⅡ | 1 619 | 40.3 | |
4 | GradeⅣ | 35 | 0.9 | |
Chemotherapy | 1 | No | 1 157 | 28.8 |
2 | Yes | 2 863 | 71.2 | |
Radiotherapy | 1 | No | 2 759 | 68.6 |
2 | Yes | 1 261 | 31.4 | |
Number of lymph node dissections | 1 | 0-3 | 180 | 4.5 |
2 | ≥4 | 3 840 | 95.5 | |
T stage | 1 | T1 | 154 | 3.8 |
2 | T2 | 354 | 8.8 | |
3 | T3 | 3 296 | 82.0 | |
4 | T4 | 216 | 5.4 | |
N stage | 1 | N0 | 1 062 | 26.4 |
2 | N1 | 2 958 | 73.6 | |
M stage | 1 | M0 | 3 785 | 94.2 |
2 | M1 | 235 | 5.8 | |
Marital status | 1 | Married | 2 604 | 64.8 |
2 | Single | 514 | 12.8 | |
3 | Other | 902 | 22.4 | |
Tumor size(l/mm) | 1-540 | - | - | - |
Lymph node positive ratio | 0-1 | - | - | - |
“-”:No data. |
Tab. 2
Comparison of prognostic factors between training set and test set"
Variable | Training set (n=2 814) | Test set (n=1 206) | Statistic | P |
---|---|---|---|---|
Age(year) | Z=-0.553 | 0.580 | ||
≤49 | 168 | 77 | ||
50-59 | 561 | 210 | ||
60-69 | 990 | 441 | ||
70-79 | 821 | 375 | ||
≥80 | 274 | 103 | ||
Gender | χ2=0.088 | 0.767 | ||
Female | 1 388 | 601 | ||
Male | 1 426 | 605 | ||
Race | χ2=1.233 | 0.540 | ||
Black | 285 | 135 | ||
White | 2 278 | 970 | ||
Other | 251 | 101 | ||
Primary site | χ2=4.405 | 0.354 | ||
Pancreatic head | 2 145 | 938 | ||
Pancreatic body | 192 | 78 | ||
Pancreatic tail | 247 | 110 | ||
Pancreatic overlap | 124 | 37 | ||
Other | 106 | 43 | ||
Grade | Z=-0.234 | 0.815 | ||
GradeⅠ | 246 | 110 | ||
GradeⅡ | 1 433 | 612 | ||
GradeⅢ | 1 111 | 474 | ||
GradeⅣ | 24 | 10 | ||
Chemotherapy | χ2=0.011 | 0.915 | ||
No | 812 | 346 | ||
Yes | 2 002 | 860 | ||
Radiotherapy | χ2=0.658 | 0.417 | ||
No | 1 922 | 808 | ||
Yes | 892 | 398 | ||
Number of lymph node dissections | χ2=0.262 | 0.609 | ||
0-3 | 125 | 58 | ||
≥4 | 2 689 | 1 148 | ||
T stage | Z=-0.991 | 0.322 | ||
T1 | 105 | 40 | ||
T2 | 259 | 97 | ||
T3 | 2 293 | 1 004 | ||
T4 | 157 | 65 | ||
N stage | χ2=0.193 | 0.661 | ||
N0 | 728 | 320 | ||
N1 | 2 086 | 886 | ||
M stage | χ2=0.683 | 0.409 | ||
M0 | 2 646 | 1 142 | ||
M1 | 168 | 64 | ||
Marital status | χ2=4.613 | 0.100 | ||
Married | 1 826 | 763 | ||
Single | 352 | 181 | ||
Other | 636 | 262 | ||
Tumor size | 2 814 | 1 206 | Z=-0.002 | 0.998 |
Lymph node positive ratio | 2 814 | 1 206 | Z=-0.257 | 0.797 |
Tab. 3
Single factor analysis results of training set"
Variable | Training set | Variable | Training set | |||
---|---|---|---|---|---|---|
Statistic | P | Statistic | P | |||
Age(year) | Z=-2.841 | 0.004 | Chemotherapy | χ2=11.289 | 0.001 | |
≤49 | No | |||||
50-59 | Yes | |||||
60-69 | Radiotherapy | χ2=11.033 | 0.001 | |||
70-79 | No | |||||
≥80 | Yes | |||||
Gender | χ2=0.118 | 0.731 | Number of lymph node dissections | χ2=0.064 | 0.800 | |
Female | 0-3 | |||||
Male | ≥4 | |||||
Race | χ2=0.066 | 0.968 | T stage | Z=-5.501 | <0.001 | |
Black | T1 | |||||
White | T2 | |||||
Other | T3 | |||||
Primary site | χ2=1.025 | 0.906 | T4 | |||
Pancreatic head | N stage | χ2=110.965 | <0.001 | |||
Pancreatic body | N0 | |||||
Pancreatic tail | N1 | |||||
Pancreatic overlap | M stage | χ2=4.387 | 0.340 | |||
Other | M0 | |||||
Grade | Z=-12.723 | <0.001 | M1 | |||
GradeⅠ | Marital status | χ2=2.160 | 0.340 | |||
GradeⅡ | Married | |||||
GradeⅢ | Single | |||||
GradeⅣ | Other | |||||
Tumor size | Z=-6.460 | <0.001 | ||||
Lymph node positive ratio | Z=-10.771 | <0.001 |
Tab. 4
Results of multivariate Logistic regression analysis of training set"
Variable | Training set | |||||
---|---|---|---|---|---|---|
B | Std.Error | Wald | P | Exp(B) | 95% Exp(B) | |
Grade | -0.425 | 0.120 | 12.599 | <0.001 | 0.654 | (0.517,0.822) |
Chemotherapy | 0.597 | 0.211 | 7.982 | 0.005 | 1.816 | (1.201,2.784) |
Radiotherapy | 0.335 | 0.169 | 3.937 | 0.047 | 1.398 | (1.004,1.947) |
T stage | -0.268 | 0.119 | 5.048 | 0.025 | 0.765 | (0.606,0.966) |
N stage | -0.595 | 0.214 | 7.720 | 0.005 | 0.552 | (0.363,0.839) |
Tumor size | -0.023 | 0.006 | 12.570 | <0.001 | 0.978 | (0.965,0.990) |
Lymph node positive rate | -3.980 | 0.927 | 18.432 | <0.001 | 0.019 | (0.003,0.115) |
Tab. 5
SMOTE datasets"
SMOTE dataset | Perc.over | Perc.under | Negative sample size | Positive sample size | Class.error(-) | Class.error(+) | OOB error(%) |
---|---|---|---|---|---|---|---|
Training set | - | - | 2 618 | 196 | 0.001 | 0.995 | 7.00 |
1 | 1 300 | 100 | 2 548 | 2 744 | 0.082 | 0.122 | 10.26 |
2 | 1 200 | 100 | 2 352 | 2 548 | 0.085 | 0.113 | 9.94 |
3 | 1 100 | 100 | 2 156 | 2 352 | 0.089 | 0.115 | 10.27 |
4 | 1 000 | 100 | 1 960 | 2 156 | 0.102 | 0.122 | 11.20 |
5 | 900 | 100 | 1 764 | 1 960 | 0.109 | 0.132 | 12.11 |
6 | 800 | 100 | 1 568 | 1 764 | 0.116 | 0.128 | 12.24 |
7 | 700 | 100 | 1 372 | 1 568 | 0.141 | 0.140 | 14.01 |
8 | 600 | 100 | 1 176 | 1 372 | 0.153 | 0.144 | 14.80 |
9 | 500 | 100 | 980 | 1 176 | 0.173 | 0.147 | 15.91 |
10 | 400 | 100 | 784 | 980 | 0.203 | 0.158 | 17.80 |
11 | 300 | 100 | 588 | 784 | 0.248 | 0.124 | 17.71 |
12 | 200 | 100 | 392 | 588 | 0.316 | 0.151 | 21.73 |
13 | 100 | 100 | v196 | 392 | 0.398 | 0.110 | 20.58 |
14 | 600 | 200 | 2 352 | 1 372 | 0.064 | 0.219 | 12.14 |
15 | 500 | 200 | 1 960 | 1 176 | 0.068 | 0.222 | 12.60 |
16 | 400 | 200 | 1 568 | 980 | 0.101 | 0.237 | 15.31 |
17 | 300 | 200 | 1 176 | 784 | 0.102 | 0.236 | 15.56 |
18 | 200 | 200 | 784 | 588 | 0.159 | 0.260 | 20.26 |
19 | 100 | 200 | 392 | 392 | 0.306 | 0.242 | 27.42 |
“-”: No data. |
Tab. 6
Ranking of importance of variables"
Variable | Mean decrease accuracy | Mean decrease Gini |
---|---|---|
Lymph node positive rate | 0.147 | 828.022 |
N stage | 0.133 | 224.008 |
Tumor size | 0.100 | 573.534 |
T stage | 0.057 | 149.560 |
Age | 0.032 | 151.944 |
Grade | 0.028 | 105.018 |
Primary site | 0.026 | 112.608 |
Marital status | 0.014 | 75.512 |
Radiotherapy | 0.013 | 42.982 |
Race | 0.012 | 62.160 |
Chemotherapy | 0.010 | 41.365 |
Gender | 0.007 | 39.164 |
Number of lymph node dissections | 0.005 | 24.583 |
M stage | 0.002 | 12.427 |
Tab. 7
Evaluation index results of each model"
Model | Variable set | Sensitivity | Specificity | G-mean | AUC |
---|---|---|---|---|---|
RF1 | Variable set 1: | 0.888 | 0.774 | 0.829 | 0.831 |
Lymph node positive rate | |||||
N stage | |||||
Tumor size | |||||
T stage | |||||
Age | |||||
Grade | |||||
PrimarySite | |||||
Marital status | |||||
Radiotherapy | |||||
Race | |||||
Chemotherapy | |||||
RF2 | Variable set 2: | 0.891 | 0.774 | 0.830 | 0.833 |
Lymph node positive rate | |||||
N stage | |||||
Tumor size | |||||
T stage | |||||
Age | |||||
Grade | |||||
PrimarySite | |||||
Marital status | |||||
Radiotherapy | |||||
Race | |||||
RF3 | Variable set 3: | 0.887 | 0.774 | 0.829 | 0.830 |
Lymph node positive rate | |||||
N stage | |||||
Tumor size | |||||
T stage | |||||
Age | |||||
Grade | |||||
PrimarySite | |||||
Marital status | |||||
Radiotherapy | |||||
RF4 | Variable set 4: | 0.889 | 0.750 | 0.817 | 0.820 |
Lymph node positive rate | |||||
N stage | |||||
Tumor size | |||||
T stage | |||||
Age | |||||
Grade | |||||
PrimarySite | |||||
Marital status |
Tab. 9
Model comparison results"
Model | Sensitivity | Specificity | G-mean | AUC | P | 95%CI |
---|---|---|---|---|---|---|
Logistic regression | 0.740 | 0.643 | 0.690 | 0.738 | <0.05 | (0.679,0.792) |
Support vector machine | 0.746 | 0.583 | 0.659 | 0.665 | <0.05 | (0.610,0.719) |
Decision tree | 0.791 | 0.583 | 0.679 | 0.687 | <0.05 | (0.633,0.742) |
Artificial neural network | 0.625 | 0.762 | 0.690 | 0.720 | <0.05 | (0.677,0.789) |
RF2 | 0.891 | 0.774 | 0.830 | 0.833 | <0.05 | (0.784,0.876) |
1 | LUCHINI C, CAPELLI P, SCARPA A. Pancreatic ductal adenocarcinoma and its variants[J]. Surg Pathol Clin, 2016, 9(4): 547-560. |
2 | SIEGEL R L, MILLER K D, FUCHS H E, et al. Cancer statistics,2021[J].CA Cancer J Clin,2021,71(1): 7-33. |
3 | 曹毛毛, 陈万青. 中国恶性肿瘤流行情况及防控现状[J]. 中国肿瘤临床, 2019, 46(3): 145-149. |
4 | 杨永超, 李宜雄. 胰腺癌外科治疗的历史和现状[J]. 中国普通外科杂志, 2018, 27(3): 269-283. |
5 | 肿瘤医学论坛.2020年全球癌症最新数据解读[J]. 中国肿瘤临床与康复, 2021, 28(3): 301. |
6 | MAHAJAN U M, LANGHOFF E, GONI E, et al. Immune cell and stromal signature associated with progression-free survival of patients with resected pancreatic ductal adenocarcinoma[J]. Gastroenterology, 2018, 155(5): 1625-1639.e2. |
7 | STRIJKER M, SOER E C, DE PASTENA M, et al. Circulating tumor DNA quantity is related to tumor volume and both predict survival in metastatic pancreatic ductal adenocarcinoma[J]. Int J Cancer, 2020, 146(5): 1445-1456. |
8 | YU S, LI Y, LIAO Z, et al. Plasma extracellular vesicle long RNA profiling identifies a diagnostic signature for the detection of pancreatic ductal adenocarcinoma[J]. Gut, 2020, 69(3): 540-550. |
9 | 张峻烽, 刘淞淞, 王槐志. 基于WGCNA和ss GSEA的胰腺癌预后模型构建[J].中国临床新医学,2020,13(11): 1084-1090. |
10 | 周 放. 营养相关指标预测胰腺癌预后的模型构建和验证[D]. 苏州: 苏州大学, 2020. |
11 | 王 月, 赵茂先. 基于最大最小爬山算法的肺癌预后模型[J].山东科技大学学报(自然科学版),2020,39(2): 105-110. |
12 | 张灵敏, 路 宁, 袁 慧, 等. 影响胰腺癌特异性生存率的相关因素分析[J].山西医科大学学报,2019,50(11): 1531-1537. |
13 | 吴君君. 利用机器学习算法建立胰腺癌远处转移预测模型[D]. 南昌: 南昌大学, 2020. |
14 | 衡 阳, 林 海. 胰腺癌预后因素的研究进展[J]. 世界最新医学信息文摘(连续型电子期刊),2020,20(8):164-165, 169. |
15 | 程 斌, 江振辉, 陈晓鹏, 等. 胰腺恶性肿瘤患者术后预后风险模型的构建[J]. 中国普外基础与临床杂志, 2020, 27(12): 1506-1514. |
16 | 纪宇楠. 基于随机森林构建滤泡型甲状腺癌远处转移预测模型[D]. 沈阳: 中国医科大学, 2018. |
17 | 王奕森, 夏树涛. 集成学习之随机森林算法综述[J]. 信息通信技术, 2018, 12(1): 49-55. |
18 | HIDALGO M. Pancreatic cancer[J]. N Engl J Med, 2010, 362(17): 1605-1617. |
19 | QIU W, DUAN N, CHEN X, et al. Pancreatic ductal adenocarcinoma: machine learning-based quantitative computed tomography texture analysis for prediction of histopathological grade[J]. Cancer Manag Res, 2019, 11: 9253-9264. |
20 | 马作红, 宫学宇, 尚 海, 等. 基于控制营养状况评分生存预测模型构建及其预测晚期胰腺癌患者预后价值[J]. 临床军医杂志, 2020, 48(7): 787-790. |
21 | BRADLEY A, MEER RVAN DER, MCKAY C J. A prognostic Bayesian network that makes personalized predictions of poor prognostic outcome post resection of pancreatic ductal adenocarcinoma[J]. PLoS One, 2019, 14(9): e0222270. |
22 | 邢晓蕊. 基于机器学习算法的胰腺癌诊断模型研究[D]. 长春: 吉林大学, 2018. |
23 | ILIC M, ILIC I. Epidemiology of pancreatic cancer[J]. World J Gastroenterol, 2016, 22(44): 9694-9705. |
24 | YADAV D, LOWENFELS A B. The epidemiology of pancreatitis and pancreatic cancer[J]. Gastroenterology, 2013, 144(6): 1252-1261. |
25 | KANG M J, JANG J Y, CHANG Y R, et al. Revisiting the concept of lymph node metastases of pancreatic head cancer: number of metastatic lymph nodes and lymph node ratio according to N stage[J]. Ann Surg Oncol, 2014, 21(5): 1545-1551. |
26 | 罗国培, 倪泉兴, 虞先濬. 胰腺癌中淋巴结转移率的临床应用价值[J]. 中华消化外科杂志, 2015, 14(8): 686-688. |
27 | 方 向, 麦 刚, 刘忠亮, 等. 胰腺癌淋巴结转移情况对预后的影响[J]. 肝胆胰外科杂志, 2017, 29(1): 27-31. |
28 | SALA ELARRE P, OYAGA-IRIARTE E, YU K H, et al. Use of machine-learning algorithms in intensified preoperative therapy of pancreatic cancer to predict individual risk of relapse[J]. Cancers, 2019,11(5): 606. |
29 | WARSCHKOW R, WIDMANN B, BEUTNER U, et al. The more the better-lower rate of stage migration and better survival in patients with retrieval of 20 or more regional lymph nodes in pancreatic cancer: a population-based propensity score matched and trend SEER analysis[J]. Pancreas, 2017, 46(5): 648-657. |
[1] | Ying ZHAO,Danyu ZHAO,Chao LIU. Bioinformatics analysis on prognostic evaluation value of TXNDC11 gene in pan-cancer and its immunity regulation [J]. Journal of Jilin University(Medicine Edition), 2022, 48(1): 142-153. |
[2] | Lili QIN,Xiaobo MA,Tianye ZHAO,Xuerong TAO,Min ZHENG,Xueying WANG,Jiaxin YI,Yanhua WU,Jing JIANG. Effects of MMP-9 and TIMP-1 expressions on prognostic evaluation of gastric cancer patients after radical gastrectomy [J]. Journal of Jilin University(Medicine Edition), 2022, 48(1): 163-171. |
[3] | Shuzhen LI,Yajie CAO,Haiying GENG,Zengxiaorui CAI,Chunmei DAI,Youfeng WEN,Ning LI. Bioinformatics analysis based on expression level of HMMR in lung adenocarcinoma tissue and its impact on prognosis of LUAD patients [J]. Journal of Jilin University(Medicine Edition), 2021, 47(6): 1502-1509. |
[4] | Yang YU,Sainan LIU,Yunkai LIU,Yong LI,Yichun QIAO,Yi CHENG. Bioinformatic analysis on expression characteristics of PRPS2 and its relationship with prognosis of breast cancer [J]. Journal of Jilin University(Medicine Edition), 2021, 47(5): 1229-1236. |
[5] | Aiping WEN,Le LUO,Zhiwei MENG,Jingji JIN,Honggui ZHOU,Jihong ZHU. Expression of Tip60 protein in endometrial adenocarcinoma tissue and its clinical significance [J]. Journal of Jilin University(Medicine Edition), 2021, 47(5): 1244-1249. |
[6] | Lu LU,Dongming LI,Xueguo WANG,Dan SONG,Taicheng WANG,Hongyan ZHAO,Xiaoyong WU. Effect of chloroquine on gemcitabine-resistant cells by affecting autophagy and mitochondrial function of pancreatic cancer cells and its mechanism [J]. Journal of Jilin University(Medicine Edition), 2021, 47(4): 926-933. |
[7] | Dan SU, Yi LIU, Manman CUI, Nian YANG, Yu HUANG, Wenjing HE. Evaluation of clinical effect of screening chemotherapy regimens in treatment of ovarian cancer based on miniPDX animal models [J]. Journal of Jilin University(Medicine Edition), 2021, 47(3): 731-739. |
[8] | Dongkui ZHOU, Mingqian LU, Yaxin LIN, Xuesong FENG, Yan GAO, Hao SONG. Small cell carcinoma of gallbladder with liver and retroperitoneal lymph node metastasis: A case report and literature review [J]. Journal of Jilin University(Medicine Edition), 2021, 47(3): 747-752. |
[9] | GUO Fei, ZHU Lin, XU Hong, XIE Zongyu, ZHANG Li, DENG Xuefei. Analysis on correlation between image features of chest MSCT and prognosis in patients with novel coronavirus pneumonia [J]. Journal of Jilin University(Medicine Edition), 2020, 46(04): 867-874. |
[10] | SONG Hongyun, PAN Ying, AN Furun, ZHANG Jiakui, YANG Dongdong, ZHAI Zhimin. Clinical characteristics of acute leukemia patients with complex karyotypes and their effects on prognosis [J]. Journal of Jilin University(Medicine Edition), 2020, 46(02): 377-382. |
[11] | LIU Xiuhua, WANG Man, DONG Xinjie, LIU Jingnan, HAN Wei, GUAN Yinghui. Expression of filamin A in cancer tissue of cervical cancer patients and its clinical significance [J]. Journal of Jilin University(Medicine Edition), 2019, 45(06): 1238-1242. |
[12] | LYU Xiuyun, YANG Ting, XU Lei, WANG Lihong. Dynamic changes of MIP-1α and IL-13 levels of patients with severe asthma and their valuesin prognosis evaluation [J]. Journal of Jilin University(Medicine Edition), 2019, 45(06): 1401-1407. |
[13] | FENG Xiunan, YUAN Yi, SU Kaisheng, JIANG Zhenyu. Distribution of peripheral blood T lymphocyte subsets of patients with systemic lupus erythematosus and its relationships with disease activity and prognosis [J]. Journal of Jilin University(Medicine Edition), 2019, 45(06): 1415-1421. |
[14] | YU Xiuyan, WAN Guangcai, SUN Hongshuai, ZHU Hua, GAO Haiyan, WU Xuefeng. Relationship between fibrinogen and prognosis of breast cancer patients:A Meta-analysis [J]. Journal of Jilin University(Medicine Edition), 2019, 45(05): 1092-1097. |
[15] | HUO Xiaolei, PEI Zhen, YANG Hao, ZHANG Yiqiang, JIA Jiantao, HAN Lingna. Effect of LncRNA-BLACAT1 on cell proliferation of non-small cell lung cancer through regulation of CyclinD1/CDKN2B axis and its mechanism [J]. Journal of Jilin University(Medicine Edition), 2019, 45(04): 759-765. |
|