Journal of Jilin University Science Edition

Previous Articles     Next Articles

Frequent Itemsets Mining Segmentation Algorithm Based on Vertical Format

WANG Hongmei, HU Ming, ZHAO Shoufeng   

  1. School of Computer Science and Engineering, Changchun University of Technology, Changchun 130012, China
  • Received:2015-04-30 Online:2016-05-26 Published:2016-05-20
  • Contact: HU Ming E-mail:huming@ccut.edu.cn

Abstract:

In view of shortage of timeconsuming of connection and pruning step for Eclat algorithm, a method is proposed to divide the data set into equivalence classes with segmented storage according to the connectivity between itemsets. Using the end item pruning strategy, the connection and pruning step will be completed in constant time. In view of shortage of computation of the intersection operation of long sets for Eclat algorithm, a method is proposed to store the transaction sets of itemsets segment by multidimensional array, convert the computation of intersection operation of long sets into short sets in segment, and the concept of the expected support is proposed. It can be forecasted in the process of calculating intersection, so the times of comparing will be reduced. The experimental results show that the algorithm is superior to Eclat algorithm in time performance, and it is suitable for mining long patterns sparse data sets especially.

Key words: frequent itemset, vertical format, segmented storage; expected support

CLC Number: 

  • TP311.13