面向智能网联车辆的轨迹预测模型

doi:10.13229/j.cnki.jdxbgxb.20231046

摘要/Abstract

摘要：

相较于传统的单车智能自动驾驶系统只能根据自身对于环境感知的结果对未来进行预测，智能网联自动驾驶系统可以通过V2X技术获取额外的周围道路环境动态信息进行融合预测。本文在单车智能轨迹预测的基础上，使用特殊编码器使得轨迹预测模型可以融合自身的感知信息与来自V2X共享的动态道路信息。在CARLA仿真数据集上的实验结果证明，使用V2X技术获取周围道路环境的动态信息相较于未使用动态环境信息的轨迹预测算法能够更准确地预测车辆轨迹。

关键词: 计算机应用技术, 自动驾驶, 车联网, 轨迹预测

Abstract:

In contrast to traditional single-vehicle intelligent autonomous driving systems， which can only make predictions about the future based on their own perception of the environment， intelligent connected autonomous driving systems have the capability to enhance predictions by incorporating additional dynamic information about the surrounding road environment through V2X technology. Building upon the foundation of single-vehicle intelligent trajectory prediction， a specialized encoder was employd to enable the trajectory prediction model to seamlessly fuse its own perceptual information with dynamic road data obtained via V2X communication. The experimental results on the CARLA simulation dataset demonstrate that using V2X technology to obtain dynamic information of the surrounding road environment can more accurately predict vehicle trajectories compared to trajectory prediction algorithms that do not use dynamic environment information.

Key words: computer application technology, autonomous driving, internet of vehicles, trajectory prediction

中图分类号:

TP399

王健,贾晨威. 面向智能网联车辆的轨迹预测模型[J]. 吉林大学学报(工学版), 2025, 55(6): 1963-1972.

Jian WANG,Chen-wei JIA. Trajectory prediction model for intelligent connected vehicle[J]. Journal of Jilin University(Engineering and Technology Edition), 2025, 55(6): 1963-1972.

图/表 14

表1

表2

图1

图2

图3

图4

表3

表4

表5

表6

图5

图6

图7

图8

参考文献 59

[1]	Chai Y, Sapp B, Bansal M, et al. MultiPath: multiple probabilistic anchor trajectory hypotheses for behavior prediction[J/OL]. [2023-07-13].
[2]	Cui H G, Radosavljevic V, Chou F C, et al. Multimodal trajectory predictions for autonomous driving using deep convolutional networks[J/OL]. [2023-07-13].
[3]	Casas S, Luo W, Urtasun R. Intentnet: learning to predict intention from raw sensor data[C]∥Conference on Robot Learning, Zürich, Switzerland, 2018: 947-956.
[4]	Lee, N, Choi W, Vernaza P, et al. Desire: Distant future prediction in dynamic scenes with interacting agents[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 2017: 336-345.
[5]	Hong J, Sapp B, Philbin J. Rules of the road: predicting driving behavior with a convolutional model of semantic interactions[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 8454-8462.
[6]	Jain A, Casas S, Liao R, et al. Discrete residual flow for probabilistic pedestrian behavior prediction[DB/OL]. [2023-07-13].
[7]	Deo N, Trivedi M M. Trajectory forecasts in unknown environments conditioned on grid-based plans[DB/OL]. [2023-07-13].
[8]	Khandelwal S, Qi W, Singh J, et al. What-if motion prediction for autonomous driving[DB/OL]. [2023-07-13].
[9]	Mangalam K, An Y, Girase H, et al. From goals, waypoints & paths to long term human trajectory forecasting[C]∥IEEE/CVF International Conference on Computer Vision, Nashville, TN, USA, 2021: 15233-15242.
[10]	Phan-Minh T, Grigore E C, Boulton F A, et al. Covernet: multimodal behavior prediction using trajectory sets[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020: 14074-14083.
[11]	Rhinehart N, McAllister R, Kitani K, et al. Precog: prediction conditioned on goals in visual multiagent settings[C]∥IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 2019: 2821-2830.
[12]	Ridel D, Deo N, Wolf D, et al. Scene compliant trajectory forecast with agent-centric spatiotemporal grids[J]. IEEE Robotics and Automation Letters, 2020, 2(5): 2816-2823.
[13]	Casas S, Gulino C, Liao R, al et, Spagnn: Spatially-aware graph neural networks for relational behavior forecasting from sensor data[C]∥IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020: 9491-9497.
[14]	Huang Y, Bi H, Li Z, al et, Stgat: Modeling spatial-temporal interactions for human trajectory prediction[C]∥IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 2019: 6272-6281.
[15]	Liang M, Yang B, Hu R, al et, Learning lane graph representations for motion forecasting[C]∥Computer Vision-ECCV 2020, Glasgow, UK, 2020: 541-556.
[16]	Mohamed A, Qian K, Elhoseiny M, al et, Social-stgcnn: A social spatio-temporal graph convolutional neural network for human trajectory prediction[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020: 14424-14432.
[17]	Gao J, Sun C, Zhao H, al et, Vectornet: Encoding hd maps and agent dynamics from vectorized representation[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 2020: 11525-11533.
[18]	Zhao H, Gao J, Lan T, et al. Tnt: Targetdriven trajectory prediction[C]∥Conference on Robot Learning, London, UK, 2021: 895-904.
[19]	Zeng W, Liang M, Liao R, et al. Lanercnn: distributed representations for graph-centric motion forecasting[C]∥IEEE/RSJ International Conference on Intelligent Robots and Systems, Prague, Czech Republic, 2021: 532-539.
[20]	Alahi A, Goel K, Ramanathan V, al et, Social lstm: Human trajectory prediction in crowded spaces[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, Nevada, USA, 2016: 961-971.
[21]	Deo N, Trivedi M M. Convolutional social pooling for vehicle trajectory prediction[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 2018: 1468-1476.
[22]	Gupta A, Johnson J, Li F F, et al. Social GAN: socially acceptable trajectories with generative adversarial networks[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018: 2255-2264.
[23]	Mangalam K, Girase H, Agarwal S, al et, It is not the journey but the destination: endpoint conditioned trajectory prediction[C]∥Computer Vision-ECCV 2020, Glasgow, UK, 2020: 759-776.
[24]	Zhang L, Su P H, Hoang J, al et, Map-adaptive goal-based trajectory prediction[C]∥Conference on Robot Learning, London, UK, 2021: 1371-1383.
[25]	Kosaraju V, Sadeghian A, Mart´ın-Mart´ın R, al et, Social-bigat: multimodal trajectory forecasting using bicycle-GAN and graph attention networks[C]∥Advances in Neural Information Processing Systems, Vancouver, BC, Canada, 2019, 32: 137-146.
[26]	Li L L, Yang B, Liang M, al et, End-to-end contextual perception and prediction with interaction transformer[C]∥IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, USA, 2020: 5784-5791.
[27]	Liu Y, Zhang J, Fang L, al et, Multimodal motion prediction with stacked transformers[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021: 7577-7586.
[28]	Mercat J, Gilles T, Zoghby N E, al et, Multi-head attention for multimodal joint vehicle motion forecasting[C]∥IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 2020: 9638-9644.
[29]	Ngiam J, Caine B, Vasudevan V, et al. Scene transformer: a unified architecture for predicting multiple agent trajectories [DB/OL]. [2023-07-13].
[30]	Yu C, Ma X, Ren J, et al. Spatio-temporal graph transformer networks for pedestrian trajectory prediction[DB/OL]. [2023-07-13].
[31]	Salzmann T, Ivanovic B, Chakravarty P, et al. Trajectron++: multi-agent generative trajectory forecasting with heterogeneous data for control[DB/OL]. [2023-07-13].
[32]	Ye L, Wang Z, Chen X, et al. GSAN: graph self-attention network for learning spatial-temporal interaction representation in autonomous driving[J]. IEEE Internet of Things Journal, 2021, 9(12): 9190-9204.
[33]	Luo C, Sun L, Dabiri D, et al. Probabilistic multi-modal trajectory prediction with lane attention for autonomous vehicles[C]∥IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Las Vegas, NV, USA, 2020: 2370-2376.
[34]	Messaoud K, Deo N, Trivedi M M, et al. Multi-head attention with joint agent-map representation for trajectory prediction in autonomous driving[DB/OL]. [2023-07-13].
[35]	Gilles T, Sabatini S, Tsishkou D, et al. HOME: heatmap output for future motion estimation[J/OL]. [2023-07-13].
[36]	Vaswani A, Shazeer N, Parmar N, et al. Attention is all you need[J/OL]. [2023-07-13].
[37]	Brown T B, Mann B, Ryder N, et al. Language models are few-shot learners[DB/OL]. [2023-07-13].
[38]	Carion N, Massa F, Synnaeve G, et al. End-to-end object detection with transformers[C]∥Computer Vision-ECCV 2020: 16th European Conference, Berlin: Springer, 2020: 213-229.
[39]	Devlin J, Chang M W, Lee K, et al. BERT: pre-training of deep bidirectional transformers for language understanding[DB/OL]. [2023-07-13].
[40]	Dosovitskiy A, Beyer L, Kolesnikov A, et al. An image is worth 16×16 words: Transformers for image recognition at scale [DB/OL]. [2023-07-13].
[41]	Arnab A, Dehghani M, Heigold G, et al. Vivit: a video vision transformer[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021: 6836-6846.
[42]	Ho J, Kalchbrenner N, Weissenborn D, et al. Axial attention in multidimensional transformers [DB/OL]. [2023-07-13].
[43]	Lee J, Lee Y, Kim J, et al. Set transformer: a framework for attention-based permutation-invariant neural networks[C]∥International Conference on Machine Learning, Long Beach, California, USA, 2019: 3744-3753.
[44]	Bello I, Zoph B, Vaswani A, et al. Attention augmented convolutional networks[C]∥IEEE/CVF International Conference on Computer Vision, Seoul, Korea, 2019: 3286-3295.
[45]	Srinivas A, Lin T Y, Parmar N, et al. Bottleneck transformers for visual recognition[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 2021: 16519-16529.
[46]	Hung W C, Kretzschmar H, Lin T Y, et al. Soda: Multi-object tracking with soft data association[DB/OL]. [2023-07-13].
[47]	Ramachandran P, Parmar N, Vaswani A, et al. Stand-alone self-attention in vision models[DB/OL]. [2023-07-13].
[48]	Tay Y, Dehghani M, Abnar S, et al. Long range arena: a benchmark for efficient transformers [DB/OL]. [2023-07-13].
[49]	Tay Y, Dehghani M, Bahri D, et al. Efficient transformers: a survey[J]. ACM Computing Surveys, 2022, 55(6): 1-28.
[50]	He K, Chen X, Xie S, et al. Masked autoencoders are scalable vision learners[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022: 16000-16009.
[51]	Giuliari F, Hasan I, Cristani M, et al. Transformer networks for trajectory forecasting[C]∥International conference on pattern recognition (ICPR), Montréal, QC, Canada, 2021: 10335-10342.
[52]	Zhou Z, Ye L, Wang J, al et, Hivt: hierarchical vector transformer for multi-agent motion prediction[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 2022: 8823-8833.
[53]	Jung C, Lee D, Lee S, et al. V2X communication-aided autonomous driving: system design and experimental validation[J]. Sensors, 2020, 20(10): No.2903.
[54]	Deng R, Di B, Song L. Cooperative collision avoidance for overtaking maneuvers in cellular V2X-based autonomous driving[J]. IEEE Transactions on Vehicular Technology, 2019, 68(5): 4434-4446.
[55]	Chang M F, Lambert J, Sangkloy P, al et, Argoverse: 3D tracking and forecasting with rich maps[C]∥IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 2019: 8748-8757.
[56]	Houston J, Zuidhof G, Bergamini L, al et, One thousand and one hours: self-driving motion prediction dataset[C]∥Conference on Robot Learning, London, UK, 2021: 409-418.
[57]	Ettinger S, Cheng S, Caine B, et al. Large scale interactive motion forecasting for autonomous driving: the waymo open motion dataset[C]∥IEEE/CVF International Conference on Computer Vision, Montreal, BC, Canada, 2021: 9710-9719.
[58]	Caesar H, Bankiti V, Lang A H, et al.Nuscenes: a multimodal dataset for autonomous driving[C]∥IEEE/CVF Conference on Computer Cision and Pattern Recognition, Seattle, WA, USA, 2020: 11621-11631.
[59]	Dosovitskiy A, Ros G, Codevilla F, et al. CARLA: an open urban driving simulator[C]∥Conference on Robot Learning, Mountain View, California, 2017: 1-16.

Metrics

Viewed

Full text

Abstract

Cited

Shared

Discussed

下标	含义
0	车道状态未知
1	车道允许通行（被绿灯控制）
2	车道被黄灯控制
3	车道不允许通行（被红灯控制）

下标	含义
0~21	车道类型
22~25	车道方向
26~36	左车道类型
37~47	右车道类型
48	是否在交叉路口

城市	描述	训练集采样数	验证集采样数
1	小型城市交通场景1	100 000	25 000
2	小型城市交通场景2	100 000	25 000
3	大型的城市地图，带有环岛和大型路口场景	100 000	25 000
4	小型城市交通场景，有一条“8字形”公路	100 000	25 000
5	方形网格城市交通场景	100 000	25 000
6	高速公路场景，有长的高速公路和许多出入口，包含密歇根式左转路口	100 000	25 000
7	乡村道路场景，道路狭窄，几乎没有红绿灯	100 000	25 000
8	大型城市的市中心场景	100 000	25 000

属性	nuScenes	Argoverse	DICP1M
预测时长/s	2	3	3
轨迹数量/条	4.3×10³	324×10³	1×10⁶
城市数量/个	2	2	8
采样频率/Hz	2	10	10
动态交通环境信息	-	-	√
总计时长/h	5.5	320	1 389

超参数	Argoverse	DICP1M
嵌入维度	128	128
学习率	5×10^-4	5×10^-4
训练轮次	64	64
随机失活概率	0.1	0.1
训练精度	bf16	bf16
批量大小	16	128
权重衰减率	3×10^-4	3×10^-4