Journal of Jilin University(Medicine Edition) ›› 2025, Vol. 51 ›› Issue (4): 1039-1051.doi: 10.13481/j.1671-587X.20250420

• Research in clinical medicine • Previous Articles    

Construction of diagnostic model for Alzheimer’s disease and immune analysis based on bioinformatics and machine learning

Linrui XU1,Yiyu ZHANG1,Jiaqi CUI1,Xianzhu CONG1,Shuang LI1,Jiayu GE1,Yujia KONG1,Suzhen WANG1(),Fuyan SHI1,Jinrong WANG2()   

  1. 1.Department of Health Statistics,School of Public Health,Shandong Second Medical University,Weifang 261053,China
    2.Department of Traditional Chinese Medicine,Affiliated Hospital,Shandong Second Medical University,Weifang 261041,China
  • Received:2024-10-16 Accepted:2025-05-08 Online:2025-07-28 Published:2025-08-25
  • Contact: Suzhen WANG,Jinrong WANG E-mail:wangsz@sdsmu.edu.cn;wjrwkl@126.com

Abstract:

Objective To screen the Alzheimer’s disease(AD)-related genes and construct its diagnostic model using bioinformatics technology and machine learning (ML) algorithms, to discuss the immunological characteristics of AD patients, and to provide novel biomarkers for AD diagnosis. Methods The AD-related gene expression dataset GSE125583 was downloaded from the Gene Expression Omnibus (GEO) database. Differentially expressed genes (DEGs) were identified through differential analysis. Gene Ontology (GO) functional enrichment and Kyoto Encyclopedia of Genes and Genomes (KEGG) signaling pathway enrichment analyses were performed to explore the biological functions and signaling pathways of DEGs. A protein-protein interaction (PPI) network was constructed, and hub genes were screened using Cytoscape software combined with three ML algorithms: Least Absolute Shrinkage and Selection Operator (LASSO), eXtreme Gradient Boosting (XGBoost), and Random Forest (RF). The screened hub genes were utilized to build an AD diagnostic model via RF, followed by feature importance ranking. The model’s efficacy and key genes were evaluated using a test set. Single-sample gene set enrichment analysis (ssGSEA) was used for immune cell infiltration analysis between AD group and control group. Results Differential analysis identified 1 287 DEGs. The GO functional enrichment analysis results revealed that DEGs were primarily involved in biological functions related to neural signaling, synapses, and vesicles. KEGG signaling pathway enrichment analysis indicated significant enrichment of DEGs in ion transport, neurotransmitter, and ligand-gated channel pathways. Nine overlapping hub genes were screened by the three ML algorithms. In the AD diagnostic model, the top four key genes with highest diagnostic performance were adenylate cyclase-activating polypeptide 1 (ADCYAP1), brain-derived neurotrophic factor (BDNF), platelet-derived growth factor receptor β (PDGFRB), and C-X-C motif chemokine receptor 4 (CXCR4), with corresponding area under the curve (AUC) values of 0.852, 0.795, 0.820, and 0.756, respectively. The model achieved an AUC of 0.828, accuracy of 81.25%, sensitivity of 84.40%, and specificity of 71.43%. The immune cell infiltration analysis results demonstrated higher infiltration of macrophages, monocytes, natural killer(NK) cells, and lymphocytes in AD tissue. Among these, NK/natural killer T(NKT) cells and plasmacytoid dendritic cells showed significant correlations with the four key genes(P<0.05). Conclusion The feature genes screened based on bioinformatics and ML exhibit diagnostic potential for AD. Genes such as ADCYAP1 may serve as potential biomarkers for AD diagnosis, offering significant implications for early prevention and treatment.

Key words: Bioinformatics, Machine learning, Alzheimer’s disease, Diagnostic model, adenylate cyclase-activating polypeptide 1 gene

CLC Number: 

  • R749.16