Journal of Jilin University(Engineering and Technology Edition) ›› 2023, Vol. 53 ›› Issue (7): 2109-2114.doi: 10.13229/j.cnki.jdxbgxb.20220348

Previous Articles    

Text data compression and storage algorithm based on time series model

Yuan-han WENG1,2(),Nan LI1()   

  1. 1.College of Economics and Management,Nanjing University of Aeronautics and Astronautics,Nanjing 211000,China
    2.School of Economics and Management,Nanjing Technology University,Nanjing 211800,China
  • Received:2022-03-31 Online:2023-07-01 Published:2023-07-20
  • Contact: Nan LI E-mail:wengyuanhan2022@163.com;linan20220202@163.com

Abstract:

In order to reduce the amount of historical data of text data and improve the efficiency of text data compression and storage, a text data compression and storage algorithm based on time series model is proposed. The wavelet threshold denoising method is used to estimate and eliminate the error and noise of text data; from the perspective of text data features, the features are described in detail, and the combination and inheritance relationship between feature types are set to build a time series model. Convert the preprocessed text data into binary coded bytes with similar structure using the time series model, perform XOR operation to compress the redundant part in the result, and store the compressed data in the corresponding database, and finally complete the text Data compression storage. The simulation results show that the proposed algorithm can effectively improve the compression performance and obtain more satisfactory compression and storage results of text data.

Key words: time series model, text data, compressed storage algorithm, wavelet threshold denoising method, nonlinear function, pretreatment

CLC Number: 

  • TP393

Fig.1

Flow chart of wavelet threshold denoising"

Fig.2

Flow chart of text data compression based ontime series model"

Fig.3

Comparison of text data compression perfor-mance test results of different algorithms"

Table 1

Comparison of network average energy consumption test results of different algorithms"

Running time/min网络平均能耗/J
本文方法文献[3] 方法文献[4]方法
600.250.280.32
1200.320.360.40
1800.360.410.45
2400.400.460.50
3000.440.490.54
3600.470.530.57
4200.530.580.62
4800.580.630.68
5400.620.680.75
6000.670.720.79

Fig.4

Comparison of network average delay test results of different algorithms"

1 陆立华, 杜承烈.复杂式网络用户隐私数据多层分类存储仿真[J].计算机仿真, 2020, 37(3): 405-408, 439.
Lu Li-hua, Du Cheng-lie. Multi-layer classification storage simulation of complex network user privacy data[J]. Computer Simulation, 2020, 37(3): 405-408, 439.
2 徐嘉懿, 邓雪原.面向运维阶段的多源异构BIM数据存储方法研究[J]. 建筑技术, 2020, 51(5): 529-533.
Xu Jia-yi, Deng Xue-yuan. Research on multi-source heterogeneous bim data storage method for operation and maintenance stage[J]. Architecture Technology, 2020, 51(5): 529-533.
3 王鹤, 李石强, 于华楠, 等.基于分布式压缩感知和边缘计算的配电网电能质量数据压缩存储方法[J]. 电工技术学报, 2020, 35(21): 4553-4564.
Wang He, Li Shi-qiang, Yu Hua-nan, et al. Compression acquisition method for power quality data of distribution network based on distributed compressed sensing and edge computing[J]. Transactions of China Electrotechnical Society, 2020, 35(21): 4553-4564.
4 徐敬华, 高铭宇, 苟华伟, 等.基于非规则分块压缩的3D打印稀疏矩阵存储与重构方法[J]. 计算机学报, 2020, 43(11): 2203-2215.
Xu Jing-hua, Gao Ming-yu, Gou Hua-wei, et al. Storage and reconstruction method of sparse matrix for 3D printing based on irregular block compression [J]. Chinese Journal of Computers, 2020, 43(11): 2203-2215.
5 屈松林, 刘林.基于波形字典的铁路空口监测数据压缩算法[J]. 计算机应用研究, 2020, 37(): 266-269, 244.
Qu Song-lin, Liu Lin. Data compression algorithm for railway air interface monitoring based on waveform dictionary [J]. Computer Application Research, 2020, 37(Sup.2): 266-269, 244.
6 李建鑫, 陈鸿, 王晋祺.基于机器视觉轮廓提取的平滑处理算法[J]. 电子技术应用, 2021, 47(4): 116-120, 131.
Li Jian-xin, Chen Hong, Wang Jin-qi. A smoothing algorithm based on machine vision contour extraction [J]. Application of Electronic Technique, 2021, 47(4): 116-120, 131.
7 陈晨, 陶建锋, 郑桂妹.基于MIMO雷达的极化平滑降维酉ESPRIT算法[J]. 信号处理, 2021, 37(4): 616-623.
Chen Chen, Tao Jian-feng, Zheng Gui-mei. Unitary ESPRIT algorithm of polarization smoothing dimension reduction based on MIMO radar[J]. Journal of Signal Processing, 2021, 37(4): 616-623.
8 李志军, 张鸿鹏, 王亚楠.排列熵——CEEMD分解下的新型小波阈值去噪谐波检测方法[J].电机与控制学报, 2020, 24(12): 120-129.
Li Zhi-jun, Zhang Hong-peng, Wang Ya-nan, et al. Wavelet threshold denoising harmonic detection method based on permutation entropy——CEEMD decomposition[J]. Electric Machines and Control, 2020, 24(12): 120-129.
9 宿常鹏, 王雪梅, 许哲, 等.基于新阈值函数的小波阈值去噪方法研究[J].战术导弹技术, 2020(3): 66-72.
Su Chang-peng, Wang Xue-mei, Xu Zhe, et al. Research on wavelet threshold de-noising method based on new threshold function[J]. Tactical Missile Technology, 2020(3): 66-72.
10 康明, 韩森坪, 杨洪杰, 等.基于天然气组分红外光谱图的数据预处理方法研究[J]. 红外技术, 2021, 43(8): 804-808.
Kang Ming, Han Sen-ping, Yang Hong-jie, et al. Data preprocessing method for infrared spectra analysis of natural gas components[J]. Infrared Technology, 2021, 43(8): 804-808.
11 张海涛, 汤儒峰, 李祝莲, 等. 基于阵列探测技术的激光测距数据预处理方法[J]. 红外与激光工程, 2020, 49(8): 89-98.
Zhang Hai-tao, Tang Ru-feng, Li Zhu-lian, et al. Preprocessing method of laser ranging data based on array detection technology[J]. Infrared and Laser Engineering, 2020, 49(8): 89-98.
12 吴翌琳, 南金伶.互联网企业广告收入预测研究——基于低频数据的神经网络和时间序列组合模型[J]. 统计研究, 2020, 37(5): 94-103.
Wu Yi-lin, Jin-ling Nan. Forecasting of advertisement income of internet companies——based on neural network and time series model for low-frequency data[J]. Statistical Research, 2020, 37(5): 94-103.
13 吴晓峰, 林晓言, 靳雅楠. 基于时间序列模型遴选的集成组合预测模型[J]. 统计与决策, 2021, 37(9): 5-8.
Wu Xiao-feng, Lin Xiao-yan, Jin Ya-nan. Integrated combination prediction model based on time series model selection[J]. Statistics and Decision, 2021, 37(9): 5-8.
14 李绕波, 袁希平, 甘淑, 等. 基于特征点和关键点提取的点云数据压缩方法[J]. 激光与红外, 2021, 51(9): 1129-1136.
Li Rao-bo, Yuan Xi-ping, Gan Shu, et al. Point cloud data compression method based on feature point and key point extraction[J]. Laser & Infrared, 2021, 51(9): 1129-1136.
15 赵东保, 孟俊贞, 刘文玉.群组相似轨迹的特征点映射数据压缩方法[J]. 测绘科学,2020, 45(3): 143-149.
Zhao Dong-bao, Meng Jun-zhen, Liu Wen-yu. Feature points mapping data compression method for multiplesimilar trajectories[J]. Science of Surveying and Mapping, 2020, 45(3): 143-149.
[1] Xiao-long ZHU,Zhong XIE. Automatic construction of knowledge graph based on massive text data [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(4): 1358-1363.
[2] Fan YANG,Xu-dong ZHANG,Meng ZHAO,Bo SHE,Jun-kai DENG. Deformation behavior of shape memory alloy-metallic glass matrix composites based on finite element calculations [J]. Journal of Jilin University(Engineering and Technology Edition), 2021, 51(1): 172-180.
[3] ZHANG Qiang|Anders Thygesen|Anne Belinda Thomsen. Enzymatic hydrolysis and ethanol fermentation from corn stover pretreated by wet oxidation [J]. 吉林大学学报(工学版), 2011, 41(4): 1189-1192.
[4] Yu Shu-chun, Yan Ji-hong, Zhao Jie, Cai He-gao . Four step pretreatment method for stereo vision
[J]. 吉林大学学报(工学版), 2007, 37(03): 651-0654.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!