closed frequent itemsets, pruning strategy, data mining ,"/> 基于 ESCS 剪枝策略的闭频繁项集挖掘算法

吉林大学学报(信息科学版) ›› 2023, Vol. 41 ›› Issue (2): 329-337.

• • 上一篇    下一篇

基于 ESCS 剪枝策略的闭频繁项集挖掘算法

刘文杰, 杨海军   

  1. (兰州财经大学 信息工程学院, 兰州 730020)
  • 收稿日期:2022-04-29 出版日期:2023-04-13 发布日期:2023-04-17
  • 通讯作者: 杨海军(1966— ), 男, 甘肃天水人, 兰州财经大学教授, 硕士生 导师, 主要从事计算机算法、 信息系统等研究, (Tel)8615109310609(Email)yanghj19@ qq. com
  • 作者简介:刘文杰(1997— ), 女, 山东滨州人, 兰州财经大学硕士研究生, 主要从事数据挖掘算法、 物流管理等研究, ( Tel)86 18853850052(Email)lwj768@ 126. com
  • 基金资助:
     甘肃省自然科学基金资助项目(18JR3RA216; 21JR1RA283); 甘肃省电子商务技术与应用重点实验室(兰州财经大学)开放 基金资助项目(2018GSDZSW63A14) 

Closed Frequent Itemset Mining Algorithm Based on ESCS Pruning Strategy

LIU Wenjie, YANG Haijun   

  1. (School of Information Engineering, Lanzhou University of Finance and Economics, Lanzhou 730020, China) 
  • Received:2022-04-29 Online:2023-04-13 Published:2023-04-17

摘要: 由于在现有的闭频繁项集挖掘算法中, 剪枝策略相对单一, 大都是针对 1项集进行剪枝, 对 2项集和 n项集(n逸3)的剪枝策略相对匮乏, 而有效的剪枝策略可以提前发现并剪掉大量没有希望的项集, 因此改进闭 频繁项集的剪枝策略对此类算法效率的提升具有很大的帮助。 为此在 ESCS(Estimated Support Cooccurrence Structure)结构基础上, 提出针对 2项集的 ESCS 剪枝策略, 并应用其将经典闭频繁项集挖掘算法 DCI_Closed (Direct Count Intersect Closed)改进为 DCI_ESCS(Direct Count Intersect Estimated Support Cooccurrence Structure) 算法, 同时对 ESCS 剪枝策略的效果加以验证。 在多个公开数据集上、 不同最小支持度阈值下, 对改进前后 算法时间性能进行比较实验。 实验结果表明, 改进的 DCI_ESCS 算法在事务和项集较长的、 较稠密的数据集上 表现良好, 时间效率均有一定程度的提高。

关键词: 闭频繁项集, 剪枝策略, 数据挖掘

Abstract:  In the existing researches on closed frequent item set mining algorithms, pruning strategies are relatively single, most of which are for 1item set pruning, and there are relatively few pruning strategies for 2item set and nitem set (n逸3). However, effective pruning strategies can find and cut off a large number of hopeless item sets in advance. Therefore, improving the pruning strategy of closed frequent item set is of great help to improve the efficiency of this kind of algorithm. On the basis of ESCS(Estimated Support Cooccurrence Structure) structure, an ESCS pruning strategy for 2itemsets is proposed, and the classical closed frequent itemset mining algorithm DCI_Closed(Direct Count Intersect Closed) is improved to DCI_ESCS(Direct Count Intersect Estimated Support Cooccurrence Structure) algorithm, and the effect of ESCS pruning strategy is verified. On multiple public datasets and under different minimum support thresholds, experiments are conducted to compare the time performance of the algorithm before and after the improvement. The experimental results show that the improved DCI_ESCS algorithm performs well on long and dense data sets with long transaction and itemsets, and the time efficiency is improved to a certain extent.

Key words: closed frequent itemsets')">

closed frequent itemsets, pruning strategy, data mining

中图分类号: 

  • TP301