Journal of Jilin University(Engineering and Technology Edition) ›› 2022, Vol. 52 ›› Issue (4): 874-884.doi: 10.13229/j.cnki.jdxbgxb20200877

Previous Articles    

Rhombus sketch: adaptive and more accurate sketch for streaming data

Xiao-hui WEI1,2(),Yan-wei MIAO1,Xing-wang WANG1,2()   

  1. 1.College of Computer Science and Technology,Jilin University,Changchun 130012,China
    2.Key Laboratory of Symbolic Computation and Knowledge Engineering of Ministry of Education,Jilin University,Changchun 130012,China
  • Received:2020-11-18 Online:2022-04-01 Published:2022-04-20
  • Contact: Xing-wang WANG E-mail:weixh@jlu.edu.cn;xww@jlu.edu.cn

Abstract:

Estimating the frequency of each data (frequency estimation) is a critical task in stream data management. Sketch is one of the most memory-efficient structure for solving this kind of task. Existing multi-layer sketches often use counters of the same size to record the highest layer that data could be reached. However, many counters represent very small values, which is a waste of memory. On the other hand, multi-layer static sketches cause the problem of unreasonable memory allocation due to the change of data distribution. To address the issues, this paper proposes an elastic hierarchical structure, named Rhombus sketch. Rhombus sketch can assign an appropriate number of counters to each data on demand, and adjust memory allocation by estimating the data distribution. Experimental results show that, compared with other typical sketches, the accuracy of Rhombus sketch is greatly improved, especially for tight memory.

Key words: computer application, data streams, approximate algorithms, elasticity, sketch

CLC Number: 

  • TP399

Table 1

Meaning of each variable in Rhombus sketch"

变量名变量代表含义

Mi

Ki

n

m

flag

fi(e)

LF

CF

δ

u

d

rand

计数结构每层初始分配的计数器个数

记录结构每层计数器的个数

R1记录数据个数

L2的数据数量

内存调整标志

不同函数下元素e的标识

溢出标志

标识数据是否出现在内存调整后

计数部分计数器的位数

调整内存判定标准

hash函数个数

0~1之间随机数

Fig.1

Data structure of Rhombus sketch"

Fig.2

Comparison of ARE on transactional datasets"

Fig.3

Comparison of ARE on CAIDA datasets"

Fig.4

Comparison of AAE on transactional datasets"

Fig.5

Comparison of AAE on CAIDA datasets"

Fig.6

Comparison of ARE on different memory size"

Fig.7

Comparison of AAE on different memory size"

Fig.8

Comparison of performance rate of evaluation criteria on different skewness datasets"

Fig.9

Comparison of insertion throughput on different skewness datasets"

Fig.10

Comparison of query throughput on different skewness datasets"

Fig.11

Comparison of correct rate on different memory size"

Fig.12

Real-time frequency distribution (50 kB)"

Fig.13

Real-time frequency distribution (200 kB)"

1 Rehman M H, Liew C S, Abbas A, et al. Big data reduction methods: a survey[J]. Data Science and Engineering, 2016, 1: 265-284.
2 Li T, Chen S G, Ling Y B. Fast and compact per-flow traffic measurement through randomized counter sharing[C]∥Proceedings IEEE INFOCOM, Shanghai, China, 2012: 1622-1634.
3 Bertino Elisa. Introduction to data security and privacy[J]. Data Science and Engineering, 2016, 1: 125-126.
4 Mirylenka K, Cormode G, Palpanas T, et al. Conditional heavy hitters: detecting interesting correlations in data streams[J]. The VLDB Journal, 2015, 24: 395-414.
5 Roy P, Khan A, Alonso G. Augmented sketch: faster and more accurate stream processing[C]∥ Proceedings of the 2016 International Conference on Management of Data, San Francisco, CA, USA, 2016: 1449-1463.
6 Pandey P, Bender M A, Johnson R, et al. A general-purpose counting filter: making every bit count[C]∥Proceedings of the ACM International Conference on Management of Data, Chicago, USA: 2017: 775-787.
7 Chen J C, Zhang Q. Bias-aware sketches[J]. Proceedings of the VLDB Endowment, 2017, 10(9): 961-972.
8 Gong J Z, Tian D Y, Yang T, et al. SSS: an accurate and fast algorithm for finding top-k hot items in data streams[C]∥IEEE International Conference on Big Data and Smart Computing (BigComp), Shanghai, China, 2018:106-113.
9 Qi J H, Li W J, Yang T, et al. Cuckoo counter: a novel framework for accurate per-flow frequency estimation in network measurement[C]∥ACM/IEEE Symposium on Architectures for Networking and Communications Systems (ANCS), Cambridge, United Kingdom, 2019:1-7.
10 Zhou Y, Yang T, Jiang T, et al. Cold filter: a meta-framework for faster and more accurate stream processing[C]∥Proceedings of the 2018 International Conference on Management of Data, Houston, TX, USA, 2018: 741-756.
11 Yang T, Zhou Y, Jin H, et al. Pyramid sketch: a sketch framework for frequency estimation of data streams[J]. Proceedings of the VLDB Endowment, 2017, 10(11): 1442-1453.
12 Yang T, Gao S, Sun Z Y, et al. Diamond sketch: accurate per-flow measurement for big streaming data[J]. IEEE Transactions on Parallel and Distributed Systems, 2019, 30(12): 2650-2662.
13 Li Y L, Miao R, Kim C H, et al. FlowRadar: a better netflow for data centers [C]∥ USENIX Symposium on Networked Systems Design and Implementation, Santa Clara, CA, USA:2016: 311-324.
14 Cormode G, Muthukishnam S. An improved data stream summary: the count-min sketch and its applications[J]. Journal of Algorithms, 2005, 55(1): 58-75.
15 Estan C, Varghese G. New directions in traffic measurement and accounting[J]. ACM SIGCOMM Computer Communication Review, 2002, 32(4): 323-336.
16 Kumar A, Sung M H, Xu J, et al. Data streaming algorithms for efficient and accurate estimation of flow size distribution[C]∥Proceedings of the International Conference on Measurements and Modeling of Computer Systems, New York, NY, USA, 2004: 177-188.
17 Ying S S, Korn F, Saha B, et al. Treescope: finding structural anomalies in semi-structure data[J].Proceedings of the VLDB Endowment, 2015, 8(12): 1904-1907.
18 Regents of the University of California. The CAIDA anonymized internet traces 2018[DB/OL].[2020-10-22].
19 Real-life transactional dataset[DB/OL].[2020-10-22].
20 dataset Webdocs. [DB/OL].[2020-10-22].
[1] Lin MAO,Feng-zhi REN,Da-wei YANG,Ru-bo ZHANG. Two⁃way feature pyramid network for panoptic segmentation [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 657-665.
[2] Xue WANG,Zhan-shan LI,Ying-da LYU. Medical image segmentation based on multi⁃scale context⁃aware and semantic adaptor [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 640-647.
[3] Ji-hong OUYANG,Ze-qi GUO,Si-guang LIU. Dual⁃branch hybrid attention decision net for diabetic retinopathy classification [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 648-656.
[4] Xue-zhi WANG,Qing-liang LI,Wen-hui LI. Spatio⁃temporal model of soil moisture prediction integrated with transfer learning [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 675-683.
[5] Su-ming KANG,Ye-e ZHANG. Hadoop⁃based local timing link prediction algorithm across social networks [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(3): 626-632.
[6] You QU,Wen-hui LI. Single-stage rotated object detection network based on anchor transformation [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(1): 162-173.
[7] Hong-wei ZHAO,Dong-sheng HUO,Jie WANG,Xiao-ning LI. Image classification of insect pests based on saliency detection [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2174-2181.
[8] Zhou-zhou LIU,Qian-yun ZHANG,Xin-hua MA,Han PENG. Compressed sensing signal reconstruction based on optimized discrete differential evolution algorithm [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2246-2252.
[9] Sheng-sheng WANG,Jing-yu CHEN,Yi-nan LU. COVID⁃19 chest CT image segmentation based on federated learning and blockchain [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2164-2173.
[10] Dong-ming SUN,Liang HU,Yong-heng XING,Feng WANG. Text fusion based internet of things service recommendation for trigger⁃action programming pattern [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2182-2189.
[11] Jun-cong LIN,Jun LEI,Meng CHEN,Shi-hui GUO,Xing GAO,Ming-hong LIAO. Real⁃time camera planning for dynamic multiple targets considering cinematographic visual properties [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2154-2163.
[12] Li-li REN,Zhi-jun WANG,Dong-mei YAN. Improved multi⁃verse algorithm with combined slime mould foraging behavior [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(6): 2190-2197.
[13] Yin-di YAO,Jun-jin HE,Yang-li LI,Dang-yuan XIE,Ying LI. ET0 simulation of self⁃constructed improved whale optimized BP neural network [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1798-1807.
[14] Hong-wei ZHAO,Zi-jian ZHANG,Jiao LI,Yuan ZHANG,Huang-shui HU,Xue-bai ZANG. Bi⁃direction segmented anti⁃collision algorithm based on query tree [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1830-1837.
[15] Jie CAO,Xue QU,Xiao-xu LI. Few⁃shot image classification method based on sliding feature vectors [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(5): 1785-1791.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!