吉林大学学报(信息科学版) ›› 2026, Vol. 44 ›› Issue (1): 178-184.

• • 上一篇    下一篇

Hadoop 环境下多维数据流频繁模式并行挖掘算法

范 舟   

  1. 湖州市中心医院 信息数据中心, 浙江 湖州 313000
  • 收稿日期:2023-12-04 出版日期:2026-01-31 发布日期:2026-02-04
  • 作者简介:范舟(1982— ), 男, 浙江湖州人, 湖州市中心医院工程师, 主要从事信息系统开发与应用, 信息系统安全与维护研究, (Tel)86-15757269769(E-mail)jswztg@ 163. com
  • 基金资助:
    浙江省教育厅科研基金资助项目(Y202249500) 

Parallel Mining Algorithm for Frequent Patterns in Multidimensional Data Streams in Hadoop Environment

FAN Zhou   

  1. Information Center, Huzhou Central Hospital, Huzhou 313000, China
  • Received:2023-12-04 Online:2026-01-31 Published:2026-02-04

摘要: 针对多维度数据流的特性和复杂性, 为充分利用并行计算资源, 保证算法的可扩展性, 提出 Hadoop 环境 下多维数据流频繁模式并行挖掘算法。 设计基于 HDFS(Hadoop Distributed File System)和 MapReduce 的 Hadoop 数据流处理平台, 提出基于特征投影和拟合的 HpFitStream 聚类算法, 利用其中的多项式拟合算法完成异常数 据流处理, 并通过特征投影完成处理后数据流的降维以降低计算成本。 采用 PFPonCanTree 算法实现在 Hadoop 环境下多维数据流的频繁模式并行挖掘。 实验结果表明, 所提方法能在有效降低计算复杂度的同时, 提升算法 的可扩展性以及负载均衡能力。

关键词: 分布式计算, MapReduce 模型, 特征投影, 多项式拟合, 频繁模式, 并行挖掘

Abstract: Considering the characteristics and complexity of multidimensional data streams, in order to fully utilize parallel computing resources and ensure the scalability of the algorithm, a parallel mining algorithm for frequent patterns of multidimensional data streams in Hadoop environment is proposed. Design a Hadoop data stream processing platform based on HDFS ( Hadoop Distributed File System) and MapReduce, propose an HpFitStream clustering algorithm based on feature projection and fitting, using the polynomial fitting algorithm to handle abnormal data streams, and reducing the dimensionality of the processed data streams through feature projection to reduce computational costs. Implement frequent pattern parallel mining of multidimensional data streams in Hadoop environment using PFPonCanTree algorithm. The experimental results show that the proposed method can effectively reduce computational complexity while improving the scalability and load balancing ability of the algorithm.

Key words: Hadoop, MapReduce model, feature projection, polynomial fitting, frequent mode, parallel mining

中图分类号: 

  • TP311