吉林大学学报(信息科学版) ›› 2024, Vol. 42 ›› Issue (4): 726-732.

• • 上一篇    下一篇

基于多相似度模糊 C 均值聚类的不均衡流数据检索方法

韩云娜   

  1. 西北大学现代学院 基础部, 西安 710130

  • 收稿日期:2023-05-26 出版日期:2024-07-22 发布日期:2024-07-22
  • 作者简介: 韩云娜(1984— ), 女, 西安人, 西北大学现代学院副教授, 主要从事数论研究, ( Tel) 86-15319416627 ( E-mail) yunnahan @ 163. com。
  • 基金资助:

    陕西省教育厅专项科研基金资助项目(20JK0950)

Data Retrieval Method of Unbalanced Streaming Based on Multi-Similarity Fuzzy C-Means Clustering

HAN Yunna   

  1. Basic Department, Modern College of Northwest University, Xi蒺an 710130, China

  • Received:2023-05-26 Online:2024-07-22 Published:2024-07-22

摘要:

 针对在不均衡流数据在检索过程中, 由于数据流中存在不均衡性, 且易受差异性数据、边缘数据的影响,导致数据检索性能下降的问题, 提出了基于多相似度模糊 C 均值聚类的不均衡流数据检索方法。该方法计算出不均衡流数据之间的多相似度, 针对不同相似度的数据, 采用模糊C 均值算法对其聚类处理。 通过构建八叉树检索模型, 对聚类后的数据进行存储、编码和判断, 完成不均衡流数据的检索。实验结果表明, 所提方法的检索时间低于20 s, 查全率和查准率保持在 80% 以上, 且 NDCG ( Normalized Discounted Cumulative Gain) 数值高。

关键词: 标准特征矩阵, 交叉类簇, 数据编码筛选, 不均衡度量, 三维坐标, 判断编码

Abstract: During the retrieval process of imbalanced stream data, the performance of data retrieval decreases due to the presence of imbalance in the data stream and the susceptibility to differential and edge data. In order to reduce the impact of the above factors, an imbalanced stream data retrieval method based on multi similarity fuzzy C-means clustering is proposed. This method calculates the multiple similarities between imbalanced flow data, and uses fuzzy C-means algorithm to cluster data with different similarities. By constructing a octree retrieval model, the data after clustering is stored, encoded and judged to complete the retrieval of unbalanced stream data. The experimental results show that the retrieval time of the proposed method is less than 20 seconds, and the recall and precision rates remain above 80% , with high NDCG( Normalized Discounted Cumulative Gain) values.

Key words: standard feature matrix, cross cluster, data encoding filter, unbalanced measure; 3D coordinates, judgment code

中图分类号: 

  • TP393. 08