吉林大学学报(理学版) ›› 2021, Vol. 59 ›› Issue (3): 635-642.

• • 上一篇    下一篇

基于最频繁项提取和候选集剪枝的THIMFUP算法

杨勇1,2, 张磊1, 曲福恒1, 刘俊杰1, 陈强1   

  1. 1. 长春理工大学 计算机科学技术学院, 长春 130022; 2. 长春师范大学 教育学院, 长春 130032
  • 收稿日期:2020-09-15 出版日期:2021-05-26 发布日期:2021-05-23

THIMFUP Algorithm Based on the Most Frequent Item Extraction and Candidate Set Pruning

YANG Yong1,2, ZHANG Lei1, QU Fuheng1, LIU Junjie1, CHEN Qiang1   

  1. 1. School of Computer Science and Technology, Changchun University of Science and Technology, Changchun 130022, China;
    2. Institute of Education, Changchun Normal University, Changchun 130032, China
  • Received:2020-09-15 Online:2021-05-26 Published:2021-05-23
  • Contact: 杨勇 E-mail:yy@cust.edu.cn

摘要: 针对FBCM(基于矩阵压缩FUP(fast update algorithm))算法在项集挖掘过程中存在频繁扫描原频繁项集库, 并生成大量候选集的问题, 提出一种通过提取数据库中最频繁项的方法, 以降低对原频繁项集库的扫描次数; 并通过候选集剪枝思想, 减少算法整体运行过程中的候选集生成, 以提高频繁项集的挖掘速度. 实验结果表明, 在相同实验条件下, 该算法的效率比FBCM算法效率提高15%以上, 最高达60%.

关键词: 关联规则, 增量挖掘, 候选集剪枝, 最频繁项

Abstract: Aiming at the problem that FBCM (FUP (fast update algorithm) based on matrix compression) algorithm frequently scanned the original frequent itemset library and generated a large number of candidate sets in the process of item set mining. We proposed a method to extract the most frequent items from the database to reduce the scanning times on the original frequent itemset library, and through a candidate set pruning idea, it reduced the generation of candidate sets in the whole running process of the algorithm, so as to improve the speed of frequent itemset mining. Experimental results show that the efficiency of the algorithm is 15% higher than that of FBCM algorithm, and the highest is 60% under the same experimental conditions.

Key words: association rule, incremental mining, candidate set pruning, most frequent item

中图分类号: 

  • TP301.6