Journal of Jilin University(Engineering and Technology Edition) ›› 2019, Vol. 49 ›› Issue (4): 1026-1033.doi: 10.13229/j.cnki.jdxbgxb20180467

Previous Articles    

Autonomous driving policy learning based on deep reinforcement learning and multi⁃type sensor data

Shun YANG(),Yuan⁃de JIANG,Jian WU(),Hai⁃zhen LIU   

  1. State Key Laboratory of Automotive Simulation and Control, Jilin University, Changchun 130022, China
  • Received:2018-05-11 Online:2019-07-01 Published:2019-07-16
  • Contact: Jian WU E-mail:yangshun628@163.com;wujian@jlu.edu.cn

Abstract:

This paper proposes a policy learning approach for autonomous driving based on the DRL and multi?type sensor data. Different Convolutional Neural networks (CNNs) are employed to deal with the data from different sources (i.e., high?dimensional data from camera and low?dimensional data from lidar, GPS, etc.). Then the extracted features from CNNs are combined for training the autonomous driving policy. Finally, the TORCS, which is an open?source simulation platform, is chosen to validate the proposed method. The results demonstrate that the multi?type sensor based DRL model can get good performance on velocity and lateral error control.

Key words: vehicle engineering, deep reinforcement learning, autonomous driving, lane keeping, multi?type sensor data

CLC Number: 

  • U469.79

Fig.1

Basic framework of reinforcement learning"

Fig.2

DRL scheme based on multi?type sensor data input"

Fig.3

Architecture of the competition software"

Table 1

List of chosen sensor information"

名 称 取值范围 描 述
speed X - , + 车辆纵向速度,km/h
speed Y - , + 车辆横向速度,km/h
angle [ - π , + π ] 车辆航向与车道的夹角,rad
track Pos - 1,1 归一化的车辆与道路中心的距离,0表示在中心,±1表示车辆在道路左右边缘
track [0,200] 19个测距传感器组成的矢量,返回车辆与道路边缘的距离,m
img [0,255] 车辆驾驶第一视角图片,可视为相机检测结果

Table 2

Actor net structure"

名称 物理信息 雷达信息 相机信息
输入层 4×4 4×19 4×3×64×64
特征提取层 4×(4×1) 32×(5×5)
4×(4×1) 32×(3×3)
32×(3×3)
合并层 1244×1
全连接层 200×1
200×1
输出层 3×1(油门、制动和转向)

Table 3

Critic net structure"

名称 网络尺寸
输入层 4×(4+19+3×64×64) + 3
全连接层1 600×1
全连接层2 400×1
输出层 1×1

Fig.4

CG Speedway number 1 track shape and scenario"

Table 4

Training parameter setting"

名 称 数 值
参考速度 80
折扣系数 0.99
批量大小(Batch size) 16
演员网络学习速率 0.0001
评论家网络学习速率 0.001
最大训练周期 200
最大迭代步长 250 000
经验池保存的最大状态序列数 100 000

Fig.5

Cumulative rewards of training episode"

Fig.6

Average reward of every step in episode"

Fig.7

Longitudinal speed distribution of control result"

Fig.8

Distribution of normalized angle (between vehicle and center line of lane)"

Fig.9

Distribution of normalized lateral error"

Fig.10

Illustration of lane keeping performance of DRL controller"

1 Kohl N , Stone P . Policy gradient reinforcement learning for fast quadrupedal locomotion[C]∥IEEE International Conference on Robotics and Automation, New Orleans, LA, USA, 2004:2619⁃2624.
2 南杨, 李中健, 叶文伟 . 基于强化学习的飞行自动驾驶仪设计[J]. 电子设计工程, 2013, 21(10):45⁃47.
Nan Yang, Li Zhong⁃jian , Ye Wen⁃wei .Design of autopilot for aircraft based on reinforcement learning[J]. Electronic Design Engineering, 2013, 21(10): 45⁃47.
3 Hwangbo J , Sa I , Siegwart R , et al . Control of a quadrotor with Reinforcement Learning[J]. IEEE Robotics & Automation Letters, 2017, 2(4):2096⁃2103.
4 Xiong R , Cao J , Yu Q . Reinforcement learning⁃based real⁃time power management for hybrid energy storage system in the plug⁃in hybrid electric vehicle[J]. Applied Energy, 2018, 211:538⁃548.
5 Strehl A L , Li L , Wiewiora E , et al . PAC model⁃free reinforcement learning[C]∥ACM International Conference on Machine Learning, Pittsburgh, USA,2006:881⁃888.
6 Lecun Y , Bengio Y , Hinton G . Deep learning[J]. Nature, 2015, 521(7553):436.
7 Mnih V , Kavukcuoglu K , Silver D , et al . Playing atari with deep reinforcement learning[DB/OL].[ 2013⁃12⁃16]. https:∥arxiv.org/archive/csarXiv:1312.5602.
8 Silver D , Huang A , Maddison C J , et al . Mastering the game of go with deep neural networks and tree search[J]. Nature, 2016, 529(7587):484⁃489.
9 Sallab A E , Abdou M , Perot E , et al . End⁃to⁃end deep reinforcement learning for lane keeping assist[C]∥The 30th Conference on Neural Information Processing Systems, Barcelona, Spain, 2016.
10 Chae H , Kang C M , Kim B D , et al . Autonomous braking system via deep reinforcement learning[C]∥IEEE International Conference on Intelligent Transportation Systems, Maui, Hawaii, USA, 2017:1⁃6.
11 夏伟, 李慧云 . 基于深度强化学习的自动驾驶策略学习方法[J]. 集成技术, 2017, 6(3):29⁃40.
Xia Wei , Li Hui⁃yun . Training method of automatic driving strategy based on deep reinforcement learning[J]. Journal of Integration Technology, 2017, 6(3):29⁃40.
12 Nazari M , Oroojlooy A , Snyder L V , et al . Reinforcement learning for solving the vehicle routing problem[C]∥The 35th International Conference on Machine Learning, Stockholm, Sweden,2018.
13 Sutton R S , Barto A G . Reinforcement Learning: an introduction, bradford book[J]. Machine Learning, 2005, 16(1):285⁃286.
14 Lin L J . Reinforcement learning for robots using neural networks[D]. Pittsburgh: Carnegie Mellon University, 1993.
15 Loiacono D , Cardamone L , Lanzi P L . Simulated car racing championship: competition software manual[DB/OL]. [ 2013⁃11⁃22]. https:∥arxiv.org/archive/csarXiv:1304.1672.
[1] Jing LI,Qiu⁃jun SHI,Peng LIU,Ya⁃wei HU. Neural network sliding mode control of commercial vehicle ABS based on longitudinal vehicle speed estimation [J]. Journal of Jilin University(Engineering and Technology Edition), 2019, 49(4): 1017-1025.
[2] CHANG Cheng,SONG Chuan-xue,ZHANG Ya-ge,SHAO Yu-long,ZHOU Fang. Minimizing inverter capacity of doubly-fed machine driving electric vehicles [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1629-1635.
[3] XI Li-he,ZHANG Xin,SUN Chuan-yang,WANG Ze-xing,JIANG Tao. Adaptive energy management strategy for extended range electric vehicle [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1636-1644.
[4] HE Ren,YANG Liu,HU Dong-hai. Design and analysis of refrigeration system supplied by solar auxiliary power of refrigerator car [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1645-1652.
[5] NA Jing-xin,MU Wen-long,FAN Yi-sa,TAN Wei,YANG Jia-zhou. Effect of hygrothermal aging on steel-aluminum adhesive joints for automotive applications [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1653-1660.
[6] LIU Yu-mei,LIU Li,CAO Xiao-ning,XIONG Ming-ye,ZHUANG Jiao-jiao. Construction on collision avoidance model of bogie dynamic simulation test bench [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(6): 1661-1668.
[7] ZHAO Wei-qiang, GAO Ke, WANG Wen-bin. Prevention of instability control of commercial vehicle based on electric-hydraulic coupling steering system [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1305-1312.
[8] SONG Da-feng, WU Xi-tao, ZENG Xiao-hua, YANG Nan-nan, LI Wen-yuan. Life cycle cost analysis of mild hybrid heavy truck based on theoretical fuel consumption model [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1313-1323.
[9] ZHU Jian-feng, ZHANG Jun-yuan, CHEN Xiao-kai, HONG Guang-hui, SONG Zheng-chao, CAO Jie. Design modification for automotive body structure based on seat pull safety performance [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1324-1330.
[10] NA Jing-xin, PU Lei-xin, FAN Yi-sa, SHEN Chuan-liang. Effect of temperature and humidity on the failure strength of Sikaflex-265 aluminum adhesive joints [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1331-1338.
[11] WANG Yan, GAO Qing, WANG Guo-hua, ZHANG Tian-shi, YUAN Meng. Simulation of mixed inner air-flow integrated thermal management with temperature uniformity of Li-ion battery [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1339-1348.
[12] JIN Li-sheng, XIE Xian-yi, GAO Lin-lin, GUO Bai-cang. Distributed electric vehicle stability control based on quadratic programming [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1349-1359.
[13] KUI Hai-lin, BAO Cui-zhu, LI Hong-xue, LI Ming-da. Idling time prediction method based on least square support vector machine [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1360-1365.
[14] WANG De-jun, WEI Wei-li, BAO Ya-xin. Actuator fault diagnosis of ESC system considering crosswind interference [J]. Journal of Jilin University(Engineering and Technology Edition), 2018, 48(5): 1548-1555.
[15] HU Man-jiang, LUO Yu-gong, CHEN Long, LI Ke-qiang. Vehicle mass estimation based on longitudinal frequency response characteristics [J]. 吉林大学学报(工学版), 2018, 48(4): 977-983.
Viewed
Full text


Abstract

Cited

  Shared   
  Discussed   
No Suggested Reading articles found!