吉林大学学报(信息科学版) ›› 2025, Vol. 43 ›› Issue (4): 822-829.

• • 上一篇    下一篇

基于改进Apriori 的药物信息敏感数据挖掘算法

马 洁1, 周 婷2, 杨慧波1, 李如山3   

  1. 1. 开滦总医院 药剂科,河北唐山063000;2. 玉田县中医院 药剂科,河北唐山064100; 3. 河北省地质矿产勘查开发局第二地质大队河北省矿山环境修复治理技术中心,河北唐山063400
  • 收稿日期:2023-07-21 出版日期:2025-08-15 发布日期:2025-08-15
  • 通讯作者: 李如山(1986— ), 男, 河北唐山人, 河北省地质矿产勘查开发局第二地 质大队高级工程师,主要从事计算机技术、三维地质、水工环地质等研究,(Tel)86-15747849631(E-mail)wwh9631@126.com。 E-mail:mmjg63@126. com
  • 作者简介:马洁(1988—摇), 女, 石家庄人, 开滦总医院主管药师, 主要从事药物信息学、医院药学、临床药理学研究, (Tel)86- 15373588302(E-mail)mmjg63@126. com
  • 基金资助:
    河北省2023年度医学科学研究课题计划基金资助项目(20231853)

Sensitive Data Mining Algorithm of Drug Information Based on Improved Apriori

MA Jie1, ZHOU Ting2, YANG Huibo1, LI Rushan3   

  1. 1. Department of Pharmacy, Kailuan General Hospital, Tangshan 063000, China; 2. Department of Pharmacy, Yutianxian Zhongyiyuan, Tangshan 064100, China; 3. Hebei Provincial Mine Environment Restoration and Treatment Technology Center, The Second Geological Brigade of the Geological and Mineral Exploration and Development Bureau of Hebei Province, Tangshan 063400, China
  • Received:2023-07-21 Online:2025-08-15 Published:2025-08-15

摘要: 针对药物信息数据具有类别不平衡的特点,敏感数据可解释性较差且较多、应用效果与挖掘准确率较低 的问题,提出了一种基于改进Apriori的药物信息敏感数据挖掘算法。将药物数据分解成若干个带限固有模态函数,更新与去噪药物信息数据,根据药物敏感数据特征子集的信息增益以及蒙特卡洛采样策略提取敏感数据 特征子集,分析隐层输出函数和特征子集之间的关系。引入极限学习机改进Apriori算法, 筛选出具有显著 关联性的药物组合,并对其求解,匹配候选特征子集对应的敏感数据特征,构建敏感数据挖掘函数。实验结果表明, 该算法的数据信号波动幅度较小, 能较为清楚地分辨出敏感数据, 挖掘错误的数据数量不超过2, 提升敏感数据可解释性

关键词: 改进Apriori算法,  数据挖掘, 样本熵, 极限学习机

Abstract: Drug information data has the characteristic of imbalanced categories, with poor interpretability and a large number of sensitive data. The application effect and mining accuracy of sensitive data are low. Therefore, an improved Apriori based sensitive data mining algorithm for drug information is proposed. The drug data is decomposed into several band limited intrinsic mode functions, and is updated and denoised, the feature subset of the sensitive data is extracted according to the information gain of the feature subset of the drug sensitive data and the Monte Carlo sampling strategy. The relationship between the hidden layer output function and the feature subset is analyzed. The extreme learning machine is introduced to improve the Apriori algorithm. And the drug combinations with significant relevance are screened out and solved. The sensitive data features are matched corresponding to the candidate feature subset and a sensitive data mining function is constructed. The experimental results show that the data signal fluctuation amplitude is small, and sensitive data can be clearly distinguished. The number of erroneous data mined does not exceed 2, improving the interpretability of sensitive data.

Key words: improved Apriori algorithm, data mining, sample entropy, extreme learning machine

中图分类号: 

  • TP391