吉林大学学报(信息科学版) ›› 2023, Vol. 41 ›› Issue (1): 174-179.

• • 上一篇    下一篇

分布式多维数据流频繁模式挖掘算法设计

施一飞   

  1. (吉利学院 智能科技学院, 成都 641423)
  • 收稿日期:2022-01-14 出版日期:2023-02-08 发布日期:2023-02-09
  • 作者简介:施一飞( 1983— ), 男, 江苏扬州人, 吉利学院教授, 博士研究生, 主要从事人工智能、 软件技术研究, ( Tel) 86-18211139989(E-mail)pppxx999@ 126. com。
  • 基金资助:
    北京市高等教育本科教学改革创新基金资助项目(2022xzky001)

Algorithm Design for Mining Frequent Patterns in Distributed Multidimensional Data Streams

SHI Yifei   

  1. (School of Intelligence Technology, Geely University of China, Chengdu 641423, China)
  • Received:2022-01-14 Online:2023-02-08 Published:2023-02-09

摘要: 针对在对分布式、 多维数据流频繁模式挖掘算法研究时, 没有删除多维数据流中的非频繁项集, 存在平均处理时间长的问题, 提出分布式多维数据流频繁模式挖掘算法。 该方法根据人工神经网络特点, 建立了人工神经网络模型, 并对多维数据流训练, 以达到提升挖掘效率的目的; 并基于训练结果构造数据流频繁模式信息树, 即频繁模式树(FR-tree: Frequent Pattern tree)。 由于 FR-tree 中存在较多过期的多维数据流, 所以需要对 FR-tree 剪枝, 并删除非频繁项集, 从而加快频繁模式计算速度, 并采用分布式挖掘算法对全局 FR-tree挖掘, 从中取得多维数据流的频繁项集完全集, 实现分布式多维数据流频繁模式的挖掘。 通过对该方法的平均处理时间测试, 验证了该方法的实用性。

关键词: 人工神经网络, 分布式多维数据流, 频繁模式, 挖掘算法, FR-tree 算法

Abstract: In the research of distributed multidimensional data stream frequent pattern mining algorithm, the non frequent items in multidimensional data stream are not deleted, and there is a problem of long average processing time. A distributed multidimensional data stream frequent pattern mining algorithm based on artificial neural network is proposed. According to the characteristics of artificial neural network, this method establishes an artificial neural network model and trains multi-dimensional data flow, so as to improve the mining efficiency; Based on the training results, a frequent pattern information tree, FR-tree ( Frequent Pattern tree ), is constructed. Because there are many expired multidimensional data streams in fr tree, it is necessary to prune fr tree and delete non frequent itemsets, so as to speed up the calculation of frequent patterns. Then, the distributed mining algorithm is used to mine the global fr tree to obtain the complete set of frequent itemsets of multidimensional data streams, so as to realize the mining of frequent patterns of distributed multidimensional data streams. The experimental results show that the average processing time of the method is tested to verify the practicability of the method.

Key words: artificial neural network, distributed multi-dimensional data flow, frequent patterns, mining algorithm, frequent pattern tree(FR-tree)

中图分类号: 

  • TP274