Journal of Jilin University(Medicine Edition) ›› 2025, Vol. 51 ›› Issue (2): 437-446.doi: 10.13481/j.1671-587X.20250218

• Research in clinical medicine • Previous Articles    

Bioinformatics analysis on differentially expressed genes in multiple primary lung cancers based on GEO database

Bo LIU,Chao SUN,Xu WANG,Kewei MA()   

  1. Tumor Center,First Hospital,Jilin University,Changchun 130021,China
  • Received:2024-04-07 Accepted:2024-05-20 Online:2025-03-28 Published:2025-04-22
  • Contact: Kewei MA E-mail:makw@jlu.edu.cn

Abstract:

Objective To screen out the differentially expressed genes (DEGs) in multiple primary lung cancers (MPLCs) using bioinformatics methods, and to analyze their biological functions and their influence in the prognosis of lung adenocarcinoma. Methods Single-cell transcriptome sequencing data (GSE200972) was downloaded from the Gene Expression Omnibus (GEO) database. After preliminary data processing with R software, the Seurat R package was used for data processing, cell clustering, and annotation. The clusterProfiler R package was used for Gene Set Enrichment Analysis (GSEA). The STRING database and Cytoscape software were employed to construct the protein-protein interaction (PPI) network and to screen out the key genes (Hub genes). The gene expression levels in the lung adenocarcinoma database were analyzed using Gene Expression Profiling Interactive Analysis (GEPIA) database. Real-time fluorescence quantitative PCR (RT-qPCR) method was used to detect the gene expression in tumor tissue of A549 xenograft mice and lung tissue of normal mice. Kaplan-Meier Plotter was used for prognosis analysis. Results Seven cell types were identified from cell clustering analysis,which were epithelial cells, endothelial cells, fibroblasts, T cells and natural killer (NK) cells, B cells, myeloid cells, and mast cells. A total of 14 605 DEGs were screened out between tumor epithelial cells and normal epithelial cells. The GSEA results revealed four activated pathways in tumor samples [myelocytomatosis oncogene (MYC) pathway, P53 pathway, oxidative phosphorylation pathway and glycolysis pathway] and one inhibited pathway [tumor necrosis factor-α (TNF-α) and nuclear factor kappa B (NF-κB) pathway]. The Hub genes identified from PPI network included CXC motif chemokine ligand 8 (CXCL8), glyceraldehyde-3-phosphate dehydrogenase (GAPDH), CXC motif chemokine receptor 4 (CXCR4), kirsten rat sarcoma viral proto-oncogene (KRAS), CXC motif chemokine ligand 1 (CXCL1), C-C motif chemokine ligand 2 (CCL2), mucin 1 (MUC1), and secreted phosphoprotein 1 (SPP1). The GEPIA database analysis and animal experiments showed that the expression levels of SPP1 mRNA in non-small cell lung cancer tissue were increased compared with normal lung tissue (P<0.01). The Kaplan-Meier survival analysis indicated that the patients with high expression level of SPP1 had shorter overall survival (OS) than those with low expression level (P<0.01). Conclusion There are activation of oncogene-related pathways and activation of tumor suppressor pathway antagonizing tumor progression in MPLCs. Moreover, elevated expression of SPP1 in non-small cell lung cancer may indicate a relatively poor prognosis.

Key words: Multiple primary lung cancer, Single-cell transcriptome sequencing, Bioinformatics, Gene set enrichment analysis, Survival analysis

CLC Number: 

  • R734.2