Journal of Jilin University(Engineering and Technology Edition) ›› 2022, Vol. 52 ›› Issue (10): 2325-2332.doi: 10.13229/j.cnki.jdxbgxb20210231

Previous Articles    

Algorithm for repairing abnormal toll data of expressway based on SSC and XGBoost

Li-li PEI(),Zhao-yun SUN(),Yu-xi HAN,Wei LI,Yuan-jiao HU   

  1. School of Information Engineering,Chang'an University,Xi'an 710064,China
  • Received:2021-03-15 Online:2022-10-01 Published:2022-11-11
  • Contact: Zhao-yun SUN E-mail:peilili@chd.edu.cn;zhaoyunsun@126.com

Abstract:

For the detection and repair of anomaly in expressway toll data, an anomaly detection algorithm based on SSC (Sum of Similar Coefficients) suitable for multi-dimensional reading data joint detection and an anomaly repair algorithm based on XGBoost (eXtreme Gradient Boosting) multidimensional data prediction repair method are proposed,and above methods are applied to data test. The results show that the SSC considers the correlation between data dimensions and could accurately detect anomaly in multi-dimensional data. Meanwhile, compared to the improved Lagrange interpolation, the R2 of the proposed method increased from 0.9166 to 0.9856. The algorithms proposed in this paper are effective and could provide high quality data support for data analysis and statistics of expressway management departments.

Key words: traffic information engineering, anomaly detection and repair, sum of similar coefficients, XGBoost, K-means

CLC Number: 

  • U495

Table 1

Part of original toll data feature factor"

特征名称含 义单 位样例数据
ID数据序号/50968476
InTime进站口时间/2016/10/26 09:46:50
OutTime出站口时间/6:28
InStation Name进站名称/15
OutStation Name出站名称/16
InLoad进站车辆总重(进站荷载)100 kg40
OutLoad出站车辆总重(出站荷载)100 kg40
Credit消费金额555.75
Last Balance消费后余额2345.5

Table 2

Toll data characteristic statistics"

SortOutLoad/100 kg

OutStation

Name

InLoad/100 kg

InStation

Name

总数866.00879.00869.00879.00
平均值39.49424.761039.334.54
标准差21.56623.903721.633.73
最小值10.00001.00001.000.00
25%25.00002.000025.002.000000
50%35.00003.000035.003.000000
75%46.00007.000046.006.000000
最大值1000.000016.000083.0020.000000
总数866.00879.00869.00879.00

Fig.1

Step of anomaly detection algorithm based on Euclidean distance"

Fig.2

Step of anomaly detection algorithm based on sum of similar coefficient"

Table 3

XGBoost model parameters"

模型参数参数解释
n_estimation最佳迭代次数

max_depth

min_child_weight

最大深度

最小叶子节点样本权重和

subsample随机采样数
colsample bytree每棵随机采样列数占比

Fig.3

Original data clustering results"

Fig.4

Results of data cleaning"

Fig.5

Comparison of expressway toll data anomaly repair results"

Fig.6

Comparison of high-precision expressway toll data anomaly repair results"

Fig.7

Expressway toll data anomaly repair results comparison"

1 Byungtae C, Lee S H. A study on intelligent traffic system related with smart city[J]. International Journal of Smart Home, 2015, 9(7): 223-230.
2 Zhou R G, Zhong L D, Zhao N L, et al. The development and practice of china highway capacity research[J]. Transportation Research Procedia, 2016, 15: 14-25.
3 赵怀鑫, 邓然然, 张英杰, 等. 一种用于高速公路通行情况分析的收费数据挖掘方法[J]. 中国公路学报, 2018, 31(8): 155-164.
Zhao Huai-xin, Deng Ran-ran, Zhang Ying-jie, et al. A toll data mining method for expressway traffic situation analysis[J]. China Highway Journal, 2018, 31(8): 155-164.
4 Swapna S, Niranjan P, Srinivas B, et al. Data cleaning for data quality[C]∥2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 2016: 344-348.
5 Yoon K, Bae D. Pattern-based outlier detection method identifying abnormal attributes in software project data[J]. Information & Software Technology, 2010, 52(2): 137-151.
6 Juhola M, Joutsijoki H, Aalto H, et al. On classification in the case of a medical data set with a complicated distribution[J]. Applied Computing & Informatics, 2014, 10(1/2): 52-67.
7 Greenwood N, Shields K. An introduction to data cleaning using internet search data[J]. Australian Economic Review, 2017, 50(3): 363-372.
8 Dilling S, Macvicar B. Cleaning high-frequency velocity profile data with autoregressive moving average (ARMA) models[J]. Flow Measurement & Instrumentation, 2017, 54: 68-81.
9 Titouna C, Naït-abdesselam F, Khokhar A. A novel data cleansing approach for sensitive applications of wireless sensor networks[C]∥2019 International Conference on Smart Applications, Communications and Networking (SmartNets), Sharm El Sheikh, Egypt, 2019: 1-6.
10 肖心园, 江冰, 任其文,等. 基于插值法和皮尔逊相关的光伏数据清洗[J]. 信息技术, 2019, 43(5): 19-22, 28.
Xiao Xin-yuan, Jiang Bing, Ren Qi-wen, et al. Photovoltaic data Cleaning based on interpolation and Pearson correlation [J]. Information Technology, 2019, 43(5):19-22, 28.
11 苗润华. 基于聚类和孤立点检测的数据预处理方法的研究[D]. 北京:北京交通大学计算机与信息技术学院, 2012.
Miao Run-hua. Research on data preprocessing Method based on clustering and outlier detection[D]. Beijing: Beijing Jiaotong University College of Computer and Information Technology, 2012.
12 封富君, 姚俊萍, 李新社, 等. 大数据环境下的数据清洗框架研究[J]. 软件, 2017, 38(12): 193-196.
Feng Fu-jun, Yao Jun-ping, Li Xin-she, et al. Research on data cleaning framework in big data environment[J]. Software, 2017, 38(12): 193-196.
13 Pappas C, Papalexiou S, Koutsoyiannis D. A quick gap filling of missing hydrometeorological data[J]. Journal of Geophysical Research Atmospheres, 2015, 119(15): 9290-9300.
14 Pilsung K. Locally linear reconstruction based missing value imputation for supervised learning[J]. Neurocomputing, 2013, 118: 65-78.
15 Zhao L, Chen Z K, Yang Z N, et al. Local similarity imputation based on fast clustering for incomplete data in cyber-physical systems[J]. IEEE Systems Journal, 2016, 12(2): 1610-1620.
16 邹嵩涵. 面向高速公路收费数据的异常行为分析与应用[D]. 成都: 电子科技大学计算机科学与技术学院, 2020.
Zou Song-han. Analysis and application of abnormal behavior oriented expressway toll data[D]. Chengdu: College of Computer Science and Technology,University of Electronic Science and Technology, 2020.
17 周舟. 高速公路异常数据检测方法研究[D]. 长春: 长春理工大学计算机技术学院, 2018.
Zhou Zhou. Research on highway abnormal data detection method [D]. Changchun: College of Computer Technology, Changchun University of Science and Technology,2018.
18 蒋怡玥. 基于高速公路收费数据的交通分布时空相关性研究[D]. 北京: 北京交通大学交通运输学院, 2019.
Jiang Yi-yue. Research on the spatio-temporal correlation of traffic distribution based on freeway toll data[D]. Beijing: College of Transportation, Beijing Jiaotong University, 2019.
19 Pei Li-li, Sun Zhao-yun, Han Yu-xi, et al. Highway event detection algorithm based on improved fast peak clustering[J]. Mathematical Problems in Engineering, 2021(1): 1-13.
20 李松松. 基于收费数据挖掘的高速公路旅行时间预测和交通状态判别应用研究[D]. 广东: 华南理工大学土木与交通学院, 2017.
Li Song-song. Application research on highway travel time prediction and traffic state discrimination based on toll data mining [D]. Guangdong: College of Civil Engineering and Transportation,South China University of Technology, 2017.
21 Mohamad I, Usman D. Standardization and its effects on K-means clustering algorithm[J]. Research Journal of Applied Sciences, Engineering and Technology, 2013, 6(17): 3299-3303.
22 Pei L L, Sun Z Y, Yu T, et al. Pavement aggregate shape classification based on extreme gradient boosting[J]. Construction and Building Materials, 2020, 256: No. 119356.
[1] Fu-heng QU,Chao-yue QIAN,Yong YANG,Yang LU,Jian-fei SONG,Ya-ting HU. Incremental k⁃means clustering algorithm based on multi⁃sphere splitting [J]. Journal of Jilin University(Engineering and Technology Edition), 2022, 52(6): 1434-1441.
[2] Chao JIA,Hong-ze XU,Long-sheng WANG. Nonlinear model predictive control for automatic train operation based on multi⁃point model [J]. Journal of Jilin University(Engineering and Technology Edition), 2020, 50(5): 1913-1922.
[3] Da-yi QU,Yan-feng JIA,Dong-mei LIU,Jing-ru YANG,Wu-lin WANG. Dynamic partitioning method for road network intersection considering multiple factors [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(5): 1478-1483.
[4] Hua⁃yue WU,Li⁃ren DUAN. Unstructured road detection method based on RGB entropy and improved region growing [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(3): 727-735.
[5] TAO Tao, XU Hong-ze. Immersion and invariance fault-tolerant control for a class high-speed trains [J]. 吉林大学学报(工学版), 2015, 45(2): 554-561.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!