吉林大学学报(工学版) ›› 2022, Vol. 52 ›› Issue (10): 2325-2332.doi: 10.13229/j.cnki.jdxbgxb20210231
• 交通运输工程·土木工程 • 上一篇
Li-li PEI(),Zhao-yun SUN(),Yu-xi HAN,Wei LI,Yuan-jiao HU
摘要:
针对高速公路收费数据中的异常检测和修复问题,分别了提出了基于相似系数和SSC(Sum of similar coefficients)的异常检测算法以及基于XGBoost (eXtreme gradient boosting)的多维数据预测修复方法,并使用这两种算法对实际收费数据进行了异常检测和修复处理。结果表明,基于SSC的异常检测算法能够考虑到数据维度之间的相关性,准确地对多维数据异常检测;同时XGBoost多元预测算法与仅针对单维数据的改进拉格朗日算法相比,R2从0.9166提升至0.9856。本文算法有效而准确,能够为公路管理部门数据分析提供高质量的数据支持。
中图分类号:
1 | Byungtae C, Lee S H. A study on intelligent traffic system related with smart city[J]. International Journal of Smart Home, 2015, 9(7): 223-230. |
2 | Zhou R G, Zhong L D, Zhao N L, et al. The development and practice of china highway capacity research[J]. Transportation Research Procedia, 2016, 15: 14-25. |
3 | 赵怀鑫, 邓然然, 张英杰, 等. 一种用于高速公路通行情况分析的收费数据挖掘方法[J]. 中国公路学报, 2018, 31(8): 155-164. |
Zhao Huai-xin, Deng Ran-ran, Zhang Ying-jie, et al. A toll data mining method for expressway traffic situation analysis[J]. China Highway Journal, 2018, 31(8): 155-164. | |
4 | Swapna S, Niranjan P, Srinivas B, et al. Data cleaning for data quality[C]∥2016 3rd International Conference on Computing for Sustainable Global Development (INDIACom), New Delhi, India, 2016: 344-348. |
5 | Yoon K, Bae D. Pattern-based outlier detection method identifying abnormal attributes in software project data[J]. Information & Software Technology, 2010, 52(2): 137-151. |
6 | Juhola M, Joutsijoki H, Aalto H, et al. On classification in the case of a medical data set with a complicated distribution[J]. Applied Computing & Informatics, 2014, 10(1/2): 52-67. |
7 | Greenwood N, Shields K. An introduction to data cleaning using internet search data[J]. Australian Economic Review, 2017, 50(3): 363-372. |
8 | Dilling S, Macvicar B. Cleaning high-frequency velocity profile data with autoregressive moving average (ARMA) models[J]. Flow Measurement & Instrumentation, 2017, 54: 68-81. |
9 | Titouna C, Naït-abdesselam F, Khokhar A. A novel data cleansing approach for sensitive applications of wireless sensor networks[C]∥2019 International Conference on Smart Applications, Communications and Networking (SmartNets), Sharm El Sheikh, Egypt, 2019: 1-6. |
10 | 肖心园, 江冰, 任其文,等. 基于插值法和皮尔逊相关的光伏数据清洗[J]. 信息技术, 2019, 43(5): 19-22, 28. |
Xiao Xin-yuan, Jiang Bing, Ren Qi-wen, et al. Photovoltaic data Cleaning based on interpolation and Pearson correlation [J]. Information Technology, 2019, 43(5):19-22, 28. | |
11 | 苗润华. 基于聚类和孤立点检测的数据预处理方法的研究[D]. 北京:北京交通大学计算机与信息技术学院, 2012. |
Miao Run-hua. Research on data preprocessing Method based on clustering and outlier detection[D]. Beijing: Beijing Jiaotong University College of Computer and Information Technology, 2012. | |
12 | 封富君, 姚俊萍, 李新社, 等. 大数据环境下的数据清洗框架研究[J]. 软件, 2017, 38(12): 193-196. |
Feng Fu-jun, Yao Jun-ping, Li Xin-she, et al. Research on data cleaning framework in big data environment[J]. Software, 2017, 38(12): 193-196. | |
13 | Pappas C, Papalexiou S, Koutsoyiannis D. A quick gap filling of missing hydrometeorological data[J]. Journal of Geophysical Research Atmospheres, 2015, 119(15): 9290-9300. |
14 | Pilsung K. Locally linear reconstruction based missing value imputation for supervised learning[J]. Neurocomputing, 2013, 118: 65-78. |
15 | Zhao L, Chen Z K, Yang Z N, et al. Local similarity imputation based on fast clustering for incomplete data in cyber-physical systems[J]. IEEE Systems Journal, 2016, 12(2): 1610-1620. |
16 | 邹嵩涵. 面向高速公路收费数据的异常行为分析与应用[D]. 成都: 电子科技大学计算机科学与技术学院, 2020. |
Zou Song-han. Analysis and application of abnormal behavior oriented expressway toll data[D]. Chengdu: College of Computer Science and Technology,University of Electronic Science and Technology, 2020. | |
17 | 周舟. 高速公路异常数据检测方法研究[D]. 长春: 长春理工大学计算机技术学院, 2018. |
Zhou Zhou. Research on highway abnormal data detection method [D]. Changchun: College of Computer Technology, Changchun University of Science and Technology,2018. | |
18 | 蒋怡玥. 基于高速公路收费数据的交通分布时空相关性研究[D]. 北京: 北京交通大学交通运输学院, 2019. |
Jiang Yi-yue. Research on the spatio-temporal correlation of traffic distribution based on freeway toll data[D]. Beijing: College of Transportation, Beijing Jiaotong University, 2019. | |
19 | Pei Li-li, Sun Zhao-yun, Han Yu-xi, et al. Highway event detection algorithm based on improved fast peak clustering[J]. Mathematical Problems in Engineering, 2021(1): 1-13. |
20 | 李松松. 基于收费数据挖掘的高速公路旅行时间预测和交通状态判别应用研究[D]. 广东: 华南理工大学土木与交通学院, 2017. |
Li Song-song. Application research on highway travel time prediction and traffic state discrimination based on toll data mining [D]. Guangdong: College of Civil Engineering and Transportation,South China University of Technology, 2017. | |
21 | Mohamad I, Usman D. Standardization and its effects on K-means clustering algorithm[J]. Research Journal of Applied Sciences, Engineering and Technology, 2013, 6(17): 3299-3303. |
22 | Pei L L, Sun Z Y, Yu T, et al. Pavement aggregate shape classification based on extreme gradient boosting[J]. Construction and Building Materials, 2020, 256: No. 119356. |
[1] | 刘兴涛,刘晓剑,武骥,何耀,刘新天. 基于曲线压缩和极限梯度提升算法的锂离子电池健康状态估计[J]. 吉林大学学报(工学版), 2022, 52(6): 1273-1280. |
[2] | 贾超,徐洪泽,王龙生. 基于多质点模型的列车自动驾驶非线性模型预测控制[J]. 吉林大学学报(工学版), 2020, 50(5): 1913-1922. |
[3] | 曲大义,贾彦峰,刘冬梅,杨晶茹,王五林. 考虑多特性因素的路网交叉口群动态划分方法[J]. 吉林大学学报(工学版), 2019, 49(5): 1478-1483. |
[4] | 吴骅跃,段里仁. 基于RGB熵和改进区域生长的非结构化道路识别方法[J]. 吉林大学学报(工学版), 2019, 49(3): 727-735. |
[5] | 陶涛,徐洪泽. 高速列车浸入与不变自适应容错控制方法[J]. 吉林大学学报(工学版), 2015, 45(2): 554-561. |
[6] | 陈 强, 李 江, 吴 想, 闫松申. 轮胎印痕识别算法及实例分析[J]. 吉林大学学报(工学版), 2005, 35(01): 39-0043. |
|