吉林大学学报(工学版) ›› 2024, Vol. 54 ›› Issue (6): 1756-1766.doi: 10.13229/j.cnki.jdxbgxb.20220865

• 计算机科学与技术 • 上一篇    

基于稀疏多头自注意力的轨迹kNN查询方法

张丽平(),刘斌毓,李松,郝忠孝   

  1. 哈尔滨理工大学 计算机科学与技术学院,哈尔滨 150080
  • 收稿日期:2022-07-07 出版日期:2024-06-01 发布日期:2024-07-23
  • 作者简介:张丽平(1976-),女,博士,副教授.研究方向:数据库理论及应用,数据挖掘,大数据. E-mail:zhangliping0730@163.com
  • 基金资助:
    国家自然科学基金项目(62072136);黑龙江省自然科学基金项目(LH2023F031);国家重点研发计划项目(2020YFB1710200)

Trajectory k nearest neighbor query method based on sparse multi-head attention

Li-ping ZHANG(),Bin-yu LIU,Song LI,Zhong-xiao HAO   

  1. School of Computer Science and Technology,Harbin University of Science and Technology,Harbin 150080,China
  • Received:2022-07-07 Online:2024-06-01 Published:2024-07-23

摘要:

针对基于位置信息的应用产生的时空数据体积量巨大且带有经纬度、时间等多维属性,基于索引的轨迹查询方法无法获取轨迹的时空语义特征,其轨迹表示方式忽略了轨迹的时空相关性并且对于大规模轨迹数据查询效率较低的问题,提出了一种基于稀疏多头自注意力机制的轨迹编码器。通过轨迹编码器可以提取轨迹的高阶语义特征,将轨迹表示为轨迹编码向量。在轨迹编码向量的基础上,提出了一种基于局部敏感哈希函数的轨迹查询方法,可以快速对大规模轨迹数据进行查询。理论研究和实验结果表明:本文轨迹查询方法在查询准确率和查询效率上优于目前已有的轨迹查询方法。

关键词: 计算机应用, 时空数据, 轨迹数据挖掘, 深度学习, 轨迹表示学习, 多头自注意力

Abstract:

The application of location information generates a large volume of spatio-temporal data with multidimensional attributes such as latitude, longitude and time. The index-based trajectory query method cannot obtain the spatio-temporal semantic features of the trajectory, and its trajectory representation ignores the spatio-temporal correlation of the trajectory and is inefficient for large-scale trajectory data query. To solve the above problems, a kind of trajectory encoder based on sparse multi-head attention is proposed, through the trajectory of the encoder can extract high-level semantic features, trajectories are represented as encoding vectors. On the basis of trajectory coding vector, a trajectory query method based on locally sensitive hash function is developed, which can query large-scale trajectory data quickly. The experiment results show that the proposed trajectory query method can effectively deal with kNNT query problems and it is superior to the current trajectory query method in terms of accuracy and efficiency.

Key words: computer application, spatio-temporal data, trajectory data mining, deep learning, trajectory representation learning, multi-head self-attention

中图分类号: 

  • TP311

图1

轨迹编码向量获取方法流程图"

图2

POI实体连接图"

图3

轨迹编码器"

表1

轨迹信息表"

移动方式移动总距离/m移动总用时/h
徒步10 1235 460
自行车6 4952 410
公交车20 2811 507
汽车32 8662 384
火车36 253745
飞机24 78940
其他9 493404

图4

训练数据流"

图5

k值对查询时间的影响"

图6

k值对查询准确率的影响"

表2

k值对查询时间的影响 (s)"

方法k
10501005001 0005 000
DGrid46.6546.9147.3450.7852.2352.31
Geohash37.9938.0436.4336.9436.7138.25
O2LSH17.2117.3617.8017.9319.7119.82
LR-Tree43.9941.8342.9643.2244.5245.63
ER-Tree39.6238.5440.2141.3045.1145.23
SMHkNN16.9416.8616.8716.9717.4417.03

图7

k值对查询准确率的影响"

表3

数据规模对查询时间的影响 (s)"

方法数据规模
1 0005 00010 000100 000
DGrid5.6324.6546.71276.85
Geohash5.2419.4736.43249.66
O2LSH4.677.4323.3226.35
LR-Tree4.5815.4637.83259.11
ER-Tree4.5214.6236.96253.75
SMHkNN4.248.4316.8724.68

图8

数据规模对查询准确率的影响"

图9

最优训练损失值变化图"

1 Ugwoke P O, Bakpo F S, Udanor C N, et al. A framework for monitoring movements of pandemic disease patients based on GPS trajectory datasets[J]. Wireless Networks, 2022, 28(1): 1-28.
2 Belhassena A, Wang H. Trajectory big data processing based on frequent activity[J]. Tsinghua Science and Technology, 2019, 24(3): 317-332.
3 Sojahrood Z B, Taleai M. Behavior-based POI recommendation for small groups in location-based social networks[J]. Transactions in GIS, 2021,26(1):259-277.
4 李松, 胡晏铭, 郝晓红, 等. 基于维度分组降维的高维数据近似k近邻查询[J]. 计算机研究与发展, 2021, 58(3): 609-623.
Li Song, Hu Yan-ming, Hao Xiao-hong, et al. Approximate k-nearest neighbor query of high dimensional data based on dimension grouping and reducing[J]. Journal of Computer Research and Development, 2021, 58(3): 609-623.
5 Qi S, Bouros P, Sacharidis D, et al. Efficient point-based trajectory Search[C]∥Advances in Spatial and Temporal Databases, Hong Kong, China, 2015: 179-196.
6 Tang L A, Zheng Y, Xie X, et al. Retrieving k-nearest neighboring trajectories by a set of point locations[C]∥Symposium on Large Spatial Databases, Minneapolis,USA, 2011: 223-241.
7 姜乔文, 刘瑜, 谭大宁, 等. 时空轨迹多维特征融合的行为规律挖掘算法[J]. 航空学报, 2022, 43(9): No.32394.
Jiang Qiao-wen, Liu Yu, Tan Da-ning, et al. Regular behaviors mining algorithm based on spatiotemporal trajectory multidimensional features fusion[J]. Acta Aeronautica ET Astronautica Sinica,2022, 43(9): No.32394.
8 Roh Gook-pil, Roh Jong-won, Hwang Seung-won, et al. Supporting pattern-matching queries over trajectories on road networks[J]. IEEE Transactions on Knowledge and Data Engineering, 2011, 23(11): 1753-1758.
9 Eiter T, Mannila H. Computing discrete Fréchet distance[EB/OL]. [2022-07-01].
10 Yi Byoung-Kee, Jagadish H V, Christos Faloutsos. Efficient retrieval of similar time sequences under time warping[C]∥Proceedings of the Fourteenth International Conference on Data Engineering, Orlando,USA, 1998:201-208.
11 Vlachos Michail, Kollios George, Gunopulos Dimitrios. Discovering similar multidimensional trajectories[C]∥ICDE 2002, San Jose,USA, 2002: 673-684.
12 Chen Lei. Tamer Özsu M, Oria V. Robust and fast similarity search for moving object trajectories[C]∥SIGMOD 2005, Baltimore,USA, 2005: 491-502.
13 许建秋, 梁珺秀, 秦小麟. 基于时空标签轨迹的k近邻模式匹配查询[J]. 通信学报, 2018, 39(4): 112-122.
Xu Jian-qiu, Liang Jun-xiu, Qin Xiao-lin. k nearest neighbor pattern match queries over spatio-temporal label trajectories[J]. Journal on Communications, 2018, 39(4): 112-122.
14 Xu J, Hartmut G R, Gao Y. Continuous k nearest neighbor queries over large multi-attribute trajectories: a systematic approach[J]. GeoInformatica, 2018, 22(4): 1-44.
15 余列冰, 向隆刚, 孙尚宇, 等. 面向分布式列式存储的轨迹大数据k近邻查询[J]. 武汉大学学报: 信息科学版, 2020, 46(5): 736-745.
Yu Lie-bing, Xiang Long-gang, Sun Shang-yu, et al. kNN query processing for trajectory big data based on distributed column-oriented storage[J]. Geomatics and Information Science of Wuhan University, 2020, 46(5): 736-745.
16 Yu Z. Trajectory data mining: an overview[J]. ACM Transactions on Intelligent Systems and Technology, 2015, 6(3): 1-41.
17 Guttman A. R-tree: a dynamic index structure for spatial searching[C]∥Proc ACMSIGMOD,San Francisco, USA, 1984: 90-100.
18 Kamel I, Faloutsos C.Hilbert R-tree: an improved R-tree using fractals[C]∥Proceedings of the 20th International Conference on Very Large Data Bases, Burlington, USA, 1994: 500-509.
19 Papadias D, Zhang D, Kollios G. Proceedings of the 10th international conference on Advances in spatial and temporal databases[C]∥International Conference on Advances in Spatial & Temporal Databases. Boston: Springer-Verlag, 2007.
20 Bowman S R, Vilnis L, Vinyals O, et al. Generating sentences from a continuous space[C]∥Proceedings of the 20th SIGNLL Conference on Computational Natural Language Learning,Berlin, Germany, 2016: 10-21.
21 Wang Sheng, Bao Zhi-feng, Shane C, et al. A survey on trajectory data management, analytics, and learning[J]. ACM Computing Surveys, 2021, 54(2): No.39.
22 Mikolov T, Chen K, Corrado G, et al. Efficient estimation of word representations in vector space[EB/OL]. [2022-07-02].
23 Rong X. Word2vec parameter learning explained[J/OL]. [2022-07-02].
24 He K, Zhang X, Ren S, et al. Deep residual learning for image recognition[C]∥Conference on Computer Vision and Pattern Recognition, Las Vegas,USA, 2016: 770-778.
25 章登义, 李想. 一种基于密度网格索引的k-最近邻查询算法[J]. 电子学报, 2017, 45(2): 376-383.
Zhang Deng-yi, Li Xiang. A k-nearest neighbor query algorithm for density grid-based index[J]. Acta Electronica Sinica, 2017, 45(2): 376-383.
26 向隆刚, 王德浩, 龚健雅. 大规模轨迹数据的Geohash编码组织及高效范围查询[J]. 武汉大学学报: 信息科学版, 2017, 42(1): 21-27.
Xiang Long-gang, Wang De-hao, Gong Jian-ya. Organization and efficient range query of large trajectory data based on geohash[J]. Geomatics and Information Science of Wuhan University, 2017, 42(1): 21-27.
27 冯小康, 彭延国, 崔江涛, 等. 基于最优排序的局部敏感哈希索引[J].计算机学报, 2020, 43(5):930-947.
Feng Xiao-kang, Peng Yan-guo, Cui Jiang-tao, et al. Locality sensitive hashing index based on optimal linear order[J]. Chinese Journal of Computers, 2020, 43(5): 930-947.
28 Ramadhan Hani, Kwon Joonho. Enhancing learned index for a higher recall trajectory k-Nearest neighbor search[C]∥2021 IEEE International Conference on Big Data, Orlando, USA, 2021: 6006-6007.
[1] 陆玉凯,袁帅科,熊树生,朱绍鹏,张宁. 汽车漆面缺陷高精度检测系统[J]. 吉林大学学报(工学版), 2024, 54(5): 1205-1213.
[2] 梁礼明,周珑颂,尹江,盛校棋. 融合多尺度Transformer的皮肤病变分割算法[J]. 吉林大学学报(工学版), 2024, 54(4): 1086-1098.
[3] 张云佐,郭威,李文博. 遥感图像密集小目标全方位精准检测算法[J]. 吉林大学学报(工学版), 2024, 54(4): 1105-1113.
[4] 杨国俊,齐亚辉,石秀名. 基于数字图像技术的桥梁裂缝检测综述[J]. 吉林大学学报(工学版), 2024, 54(2): 313-332.
[5] 李雄飞,宋紫萱,朱芮,张小利. 基于多尺度融合的遥感图像变化检测模型[J]. 吉林大学学报(工学版), 2024, 54(2): 516-523.
[6] 陈岳林,高铸成,蔡晓东. 基于BERT与密集复合网络的长文本语义匹配模型[J]. 吉林大学学报(工学版), 2024, 54(1): 232-239.
[7] 霍光,林大为,刘元宁,朱晓冬,袁梦,盖迪. 基于多尺度特征和注意力机制的轻量级虹膜分割模型[J]. 吉林大学学报(工学版), 2023, 53(9): 2591-2600.
[8] 何颖,王卓然,周旭,刘衍珩. 融合社交地理信息加权矩阵分解的兴趣点推荐算法[J]. 吉林大学学报(工学版), 2023, 53(9): 2632-2639.
[9] 张云佐,董旭,蔡昭权. 拟合下肢几何特征的多视角步态周期检测[J]. 吉林大学学报(工学版), 2023, 53(9): 2611-2619.
[10] 肖明尧,李雄飞,朱芮. 基于NSST域像素相关分析的医学图像融合[J]. 吉林大学学报(工学版), 2023, 53(9): 2640-2648.
[11] 赵亚慧,李飞雨,崔荣一,金国哲,张振国,李德,金小峰. 基于跨语言预训练模型的朝汉翻译质量评估[J]. 吉林大学学报(工学版), 2023, 53(8): 2371-2379.
[12] 金小俊,孙艳霞,于佳琳,陈勇. 基于深度学习与图像处理的蔬菜苗期杂草识别方法[J]. 吉林大学学报(工学版), 2023, 53(8): 2421-2429.
[13] 车翔玖,徐欢,潘明阳,刘全乐. 生物医学命名实体识别的两阶段学习算法[J]. 吉林大学学报(工学版), 2023, 53(8): 2380-2387.
[14] 耿庆田,刘植,李清亮,于繁华,李晓宁. 基于一种深度学习模型的土壤湿度预测[J]. 吉林大学学报(工学版), 2023, 53(8): 2430-2436.
[15] 王连明,吴鑫. 基于姿态估计的物体3D运动参数测量方法[J]. 吉林大学学报(工学版), 2023, 53(7): 2099-2108.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!