吉林大学学报(理学版)

• 计算机科学 • 上一篇    下一篇

基于垂直格式的频繁项集挖掘分段算法

王红梅, 胡明, 赵守峰   

  1. 长春工业大学 计算机科学与工程学院, 长春 130012
  • 收稿日期:2015-04-30 出版日期:2016-05-26 发布日期:2016-05-20
  • 通讯作者: 胡明 E-mail:huming@ccut.edu.cn

Frequent Itemsets Mining Segmentation Algorithm Based on Vertical Format

WANG Hongmei, HU Ming, ZHAO Shoufeng   

  1. School of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China
  • Received:2015-04-30 Online:2016-05-26 Published:2016-05-20
  • Contact: HU Ming E-mail:huming@ccut.edu.cn

摘要:

针对Eclat算法连接和剪枝操作耗时的缺点, 按照项集之间的可连接性, 将数据集划分为等价类并分段存储, 采用末项剪枝策略, 在常量时间内完成连接和剪枝操作. 针对Eclat算法求长集合的交集操作需要大量计算的缺点, 采用多维数组分段存储项集的事务集, 将长集合的求交集操作转换为分段求短集合的交集, 并提出期望支持度的概念, 在求交集的过程中预测支持度, 从而减少求交集的比较次数. 实验结果表明, 该算法在时间性能方面优于Eclat算法, 尤其适用于挖掘长模式稀疏数据集.

关键词: 频繁项集, 垂直格式, 分段存储, 期望支持度

Abstract:

In view of shortage of timeconsuming of connection and pruning step for Eclat algorithm, a method is proposed to divide the data set into equivalence classes with segmented storage according to the connectivity between itemsets. Using the end item pruning strategy, the connection and pruning step will be completed in constant time. In view of shortage of computation of the intersection operation of long sets for Eclat algorithm, a method is proposed to store the transaction sets of itemsets segment by multidimensional array, convert the computation of intersection operation of long sets into short sets in segment, and the concept of the expected support is proposed. It can be forecasted in the process of calculating intersection, so the times of comparing will be reduced. The experimental results show that the algorithm is superior to Eclat algorithm in time performance, and it is suitable for mining long patterns sparse data sets especially.

Key words: frequent itemset, vertical format, segmented storage; expected support

中图分类号: 

  • TP311.13