吉林大学学报(工学版) ›› 2022, Vol. 52 ›› Issue (4): 874-884.doi: 10.13229/j.cnki.jdxbgxb20200877
• 计算机科学与技术 • 上一篇
Xiao-hui WEI1,2(),Yan-wei MIAO1,Xing-wang WANG1,2()
摘要:
记录数据出现的次数(频数估计)是流数据管理的一类重要任务。在该类任务的解决方案中,sketch是内存使用效率较高的数据结构之一。目前的多层sketch方案,通常使用相同长度的计数器记录数据能够到达的最高层级。然而,数据频数的高度偏斜会导致部分计数器的高位内存被浪费。另一方面,由于数据分布是不断变化的,静态的sketch数据结构会造成内存分配不合理的问题。为了解决上述问题,本文提出一种弹性层级结构Rhombus sketch,能够按需将计数器分配给数据,并通过估计数据的分布动态调整内存分配。实验结果表明,相比于其他典型sketch方法,特别是在内存紧张的情况下,Rhombus sketch的准确性有明显的提高。
中图分类号:
1 | Rehman M H, Liew C S, Abbas A, et al. Big data reduction methods: a survey[J]. Data Science and Engineering, 2016, 1: 265-284. |
2 | Li T, Chen S G, Ling Y B. Fast and compact per-flow traffic measurement through randomized counter sharing[C]∥Proceedings IEEE INFOCOM, Shanghai, China, 2012: 1622-1634. |
3 | Bertino Elisa. Introduction to data security and privacy[J]. Data Science and Engineering, 2016, 1: 125-126. |
4 | Mirylenka K, Cormode G, Palpanas T, et al. Conditional heavy hitters: detecting interesting correlations in data streams[J]. The VLDB Journal, 2015, 24: 395-414. |
5 | Roy P, Khan A, Alonso G. Augmented sketch: faster and more accurate stream processing[C]∥ Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 2016: 1449-1463. |
6 | Pandey P, Bender M A, Johnson R, et al. A general-purpose counting filter: making every bit count[C]∥Proceedings of the ACM International Conference on Management of Data, Chicago, USA: 2017: 775-787. |
7 | Chen J C, Zhang Q. Bias-aware sketches[J]. Proceedings of the VLDB Endowment, 2017, 10(9): 961-972. |
8 | Gong J Z, Tian D Y, Yang T, et al. SSS: an accurate and fast algorithm for finding top-k hot items in data streams[C]∥IEEE International Conference on Big Data and Smart Computing (BigComp), Shanghai, China, 2018:106-113. |
9 | Qi J H, Li W J, Yang T, et al. Cuckoo counter: a novel framework for accurate per-flow frequency estimation in network measurement[C]∥ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), Cambridge, United Kingdom, 2019:1-7. |
10 | Zhou Y, Yang T, Jiang T, et al. Cold filter: a meta-framework for faster and more accurate stream processing[C]∥Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 2018: 741-756. |
11 | Yang T, Zhou Y, Jin H, et al. Pyramid sketch: a sketch framework for frequency estimation of data streams[J]. Proceedings of the VLDB Endowment, 2017, 10(11): 1442-1453. |
12 | Yang T, Gao S, Sun Z Y, et al. Diamond sketch: accurate per-flow measurement for big streaming data[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(12): 2650-2662. |
13 | Li Y L, Miao R, Kim C H, et al. FlowRadar: a better netflow for data centers [C]∥ USENIX Symposium on Networked Systems Design and Implementation, Santa Clara, CA, USA:2016: 311-324. |
14 | Cormode G, Muthukishnam S. An improved data stream summary: the count-min sketch and its applications[J]. Journal of Algorithms, 2005, 55(1): 58-75. |
15 | Estan C, Varghese G. New directions in traffic measurement and accounting[J]. ACM SIGCOMM Computer Communication Review, 2002, 32(4): 323-336. |
16 | Kumar A, Sung M H, Xu J, et al. Data streaming algorithms for efficient and accurate estimation of flow size distribution[C]∥Proceedings of the International Conference on Measurements and Modeling of Computer Systems, New York, NY, USA, 2004: 177-188. |
17 | Ying S S, Korn F, Saha B, et al. Treescope: finding structural anomalies in semi-structure data[J].Proceedings of the VLDB Endowment, 2015, 8(12): 1904-1907. |
18 | Regents of the University of California. The CAIDA anonymized internet traces 2018[DB/OL].[2020-10-22]. |
19 | Real-life transactional dataset[DB/OL].[2020-10-22]. |
20 | dataset Webdocs. [DB/OL].[2020-10-22]. |
[1] | 毛琳,任凤至,杨大伟,张汝波. 双向特征金字塔全景分割网络[J]. 吉林大学学报(工学版), 2022, 52(3): 657-665. |
[2] | 王雪,李占山,吕颖达. 基于多尺度感知和语义适配的医学图像分割算法[J]. 吉林大学学报(工学版), 2022, 52(3): 640-647. |
[3] | 欧阳继红,郭泽琪,刘思光. 糖尿病视网膜病变分期双分支混合注意力决策网络[J]. 吉林大学学报(工学版), 2022, 52(3): 648-656. |
[4] | 王学智,李清亮,李文辉. 融合迁移学习的土壤湿度预测时空模型[J]. 吉林大学学报(工学版), 2022, 52(3): 675-683. |
[5] | 康苏明,张叶娥. 基于Hadoop的跨社交网络局部时序链路预测算法[J]. 吉林大学学报(工学版), 2022, 52(3): 626-632. |
[6] | 曲优,李文辉. 基于锚框变换的单阶段旋转目标检测方法[J]. 吉林大学学报(工学版), 2022, 52(1): 162-173. |
[7] | 赵宏伟,霍东升,王洁,李晓宁. 基于显著性检测的害虫图像分类[J]. 吉林大学学报(工学版), 2021, 51(6): 2174-2181. |
[8] | 刘洲洲,张倩昀,马新华,彭寒. 基于优化离散差分进化算法的压缩感知信号重构[J]. 吉林大学学报(工学版), 2021, 51(6): 2246-2252. |
[9] | 王生生,陈境宇,卢奕南. 基于联邦学习和区块链的新冠肺炎胸部CT图像分割[J]. 吉林大学学报(工学版), 2021, 51(6): 2164-2173. |
[10] | 刘福寿,魏琦,徐文婷,谭国金. 基于弹性波传播和谱单元法的桁架结构损伤检测[J]. 吉林大学学报(工学版), 2021, 51(6): 2087-2095. |
[11] | 孙东明,胡亮,邢永恒,王峰. 基于文本融合的物联网触发动作编程模式服务推荐方法[J]. 吉林大学学报(工学版), 2021, 51(6): 2182-2189. |
[12] | 林俊聪,雷钧,陈萌,郭诗辉,高星,廖明宏. 基于电影视觉特性的动态多目标实时相机规划[J]. 吉林大学学报(工学版), 2021, 51(6): 2154-2163. |
[13] | 任丽莉,王志军,闫冬梅. 结合黏菌觅食行为的改进多元宇宙算法[J]. 吉林大学学报(工学版), 2021, 51(6): 2190-2197. |
[14] | 姚引娣,贺军瑾,李杨莉,谢荡远,李英. 自构建改进型鲸鱼优化BP神经网络的ET0模拟计算[J]. 吉林大学学报(工学版), 2021, 51(5): 1798-1807. |
[15] | 赵宏伟,张子健,李蛟,张媛,胡黄水,臧雪柏. 基于查询树的双向分段防碰撞算法[J]. 吉林大学学报(工学版), 2021, 51(5): 1830-1837. |
|