J4

• • 上一篇    下一篇

快速频繁序列模式挖掘算法

管恩政, 常晓宇, 王喆, 周春光   

  1. (吉林大学 计算机科学与技术学院, 长春 130012)
  • 收稿日期:2005-03-07 修回日期:1900-01-01 出版日期:2005-11-26 发布日期:2005-11-26
  • 通讯作者: 周春光

Fast Frequent Sequential Pattern Mining Algorithm

GUAN En-zheng, CHANG Xiao-yu, WANG Zhe, ZHOU Chun-guang   

  1. (College of Computer Science and Technology, Jilin University, Changchun 130012, China)
  • Received:2005-03-07 Revised:1900-01-01 Online:2005-11-26 Published:2005-11-26
  • Contact: ZHOU Chun-guang

摘要: 为解决从数据库中挖掘长模式可能遇到较高的计算复杂度问题, 提出一种新的算法FFSPAN. 传统上, 要判断一个序列是否频繁, 需要在原数据库中判断整个序列是否频繁; 而算法FFSPAN是通过在序列数据库中寻找一个频繁项或一个频繁项集来代替寻找一个完整的频繁序列, 而且FFSPAN算法每次扫描的数据库都是迅速减小的, 这使得算法在挖掘的序列模式越长时越有效. 在标准测试数据集上的实验结果表明, FFSPAN算法非常有效.

关键词: 序列模式, 长模式, 深度优先, 数据挖掘

Abstract: A novel algorithm FFSPAN (fast frequent sequential pattern mining algorithm) is proposed to solve the problem that the computational complexity may become very high when mining long patterns in a sequence database. Traditionally, to judge whether a sub-sequence is frequent in a database, one need to compare the whole sub-sequence with every sequence in the original database, however the algorithm FFSPAN succeeds in solving the problem that in a sequence database, instead of searching a whole frequent sequence, we only need to search a frequent item or a frequent itemset. Moreover, the databases scanned via FFSPAN keep shrinking, which makes the algorithm more efficient when the sequential patterns are longer. Experiments on standard test data show that FFSPAN is very effective.

Key words: sequential pattern, long pattern, depthfirst, data mining

中图分类号: 

  • TP31