Lane-change Control for Unmanned Vehicle Based on REINFORCE Algorithm and Neural Network
-
摘要: 针对无人驾驶车辆变道超车场景,研究基于REINFORCE算法和神经网络技术的无人驾驶车辆变道控制策略。通过车辆动力学模型确定模型的反馈量、控制量和输出限幅要求; 设计神经网络控制器的结构,根据REINFORCE算法设计控制器训练方案; 分析经验池数据数值和方差过大的问题,提出1种经验池数据预处理的方法以改进控制器训练方案; 结合无人驾驶车辆运行场景,分析和研究强化学习过程中产生的奖励分布稀疏问题,并针对该问题提出1种基于对数函数的奖励塑造解决方案; 与PID控制器和LQR控制器进行对比实验验证。实验结果表明,与PID相比,该控制策略有更小的最大误差,变道过程更安全; 与LQR相比,该控制策略性能表现接近,以此证明其用于无人驾驶车辆变道控制任务的可行性。此外,记录在不同平台下该控制策略的执行时间以证明其实时性和在轻量级平台运行的可行性。Abstract: For lane change and overtaking of unmanned vehicles, the paper studies the lane change control strategy of unmanned vehicles based on the REINFORCE algorithm and neural network. The feedback, control input, and output limit requirement of the vehicle dynamics model are determined. The REINFORCE algorithm is used to design the structure of the neural network controller and the training plan of the controller. For too large data value and variance of the experience pool, a preprocessing method of the experience pool data is proposed to improve the controller training plan. Besides analyzing sparse reward distribution in the reinforcement learning process, a reward shaping solution based on logarithmic function is proposed combined with the running condition of unmanned vehicles. Compared with PID and LQR controllers, the experiment is carried out. The results show that the proposed control strategy has smaller maximum error compared with PID, with a safer lane-change process. The performance of the control strategy is similar to LQR, which proves its feasibility for the lane change control task of unmanned vehicles. Also, the execution time of the control strategy in different platforms is recorded to prove its real-time performance and feasibility in lightweight platforms.
-
Key words:
- traffic control /
- unmanned vehicle /
- lane-change control /
- reinforcement learning
-
表 1 车辆固定参数表
Table 1. Fixed parameters of vehicle
固定参数 数值 sf 0.2 sr 0.2 a 1.232 b 1.468 Ccf 66 900 Ccr 62 700 Clf 66 900 Clr 62 700 m 1 723 Iz 4 175 表 2 神经网络参数表
Table 2. Parameters of the neural network
第1层 第2层 输入维度 5 200 输出维度 200 51 激活函数 tanh 无 表 3 变道完成后误差和变道过程中最大误差记录表
Table 3. Errors after lane change and the maximum error during lane change
车速和控制器 变道完成后误差/m 变道过程中最大误差/m 10 m/s,REINFORCE 0.02 0.06 10 m/s,PID 0 0.17 10 m/s,LQR 0 0.02 15 m/s,REINFORCE 0.04 0.07 15 m/s,PID 0 0.17 15 m/s,LQR 0 0.05 20 m/s,REINFORCE 0.06 0.07 20 m/s,PID 0 0.17 20 m/s,LQR 0 0.12 25 m/s,REINFORCE 0.08 0.10 25 m/s,PID 0 0.17 25 m/s,LQR 0 0.19 表 4 神经网络控制器运行时间记录表
Table 4. Running time of the neural-network controller
平台 仿真总用时/s 仿真总步数 单步平均用时/s 计算机 2.834 99 1 202 0.002 36 TX2 3.898 25 1 202 0.003 24 Jetson nano 4.859 62 1 202 0.004 04 -
[1] AHN S, CASSIDY M J. Freeway traffic oscillations and vehicle lane change Maneuvers[C]. 17th International Sympo-sium on Transportation & Traffic Theory, London: Elsevier, 2007. [2] 邱少林, 钱立军, 陆建辉. 基于最优预瞄的智能车变道控制[J]. 中国机械工程, 2019, 30(23): 2778-2783. doi: 10.3969/j.issn.1004-132X.2019.23.002QIU Shaolin, QIAN Lijun, LU Jianhui. Lane-change control for intelligent vehicles based on optimal preview[J]. China Mechanical Engineering, 2019, 30(23): 2778-2783. (in Chinese) doi: 10.3969/j.issn.1004-132X.2019.23.002 [3] 林小宁, 顾筠, 沈峘. 车辆自主快速变道的轨迹规划与跟踪控制[J]. 兰州理工大学学报, 2017, 43(6): 108-112. doi: 10.3969/j.issn.1673-5196.2017.06.021LIN Xiaoning, GU Jun, SHEN Huan. Trajectory planning and follow up controling of vehicle autonomous fast lane change[J]. Journal of Lanzhou University of Technology, 2017, 43(6): 108-112. (in Chinese) doi: 10.3969/j.issn.1673-5196.2017.06.021 [4] PENG Tao, SU Lili, ZHANG Ronghui. A new safe lane-change trajectory model and collision avoidance control method for automatic driving vehicles[J]. Expert Systems with Applications, 2019, 141: 112953. http://www.sciencedirect.com/science/article/pii/S0957417419306712 [5] HU Jianjun, XIONG Songsong, ZHA Junlin, FU Chunyun. Lane detection and trajectory tracking control of autonomous vehicle based on model predictive control[J]. International Journal of Automotive Technology, 2020, 20(2): 285-295. doi: 10.1007/s12239-020-0027-6 [6] WU Xiaodong, QIAO Bangjun, SU Chengrui. Trajectory planning with time-variant safety margin for autonomous vehicle lane change[J]. Applied Sciences-Basel, 2020, 10(5): 16-26. http://www.researchgate.net/publication/339622745_Trajectory_Planning_with_Time-Variant_Safety_Margin_for_Autonomous_Vehicle_Lane_Change [7] 聂枝根, 王万琼, 赵伟强, 等. 基于轨迹预瞄的智能汽车变道动态轨迹规划与跟踪控制[J]. 交通运输工程学报, 2020, 20(2): 147-160. https://www.cnki.com.cn/Article/CJFDTOTAL-JYGC202002012.htmNIE Zhigen, WANG Wanqiong, ZHAO Weiqiang, et al. Dynamic trajectory planning and tracking control for lane change of intelligent vehicle based on trajectory preview[J]. Journal of Traffic and Transportation Engineering, 2020, 20(2): 147-160. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-JYGC202002012.htm [8] 蔡英凤, 秦顺琪, 臧勇, 等. 基于可拓优度评价的智能汽车横向轨迹跟踪控制方法[J]. 汽车工程, 2019, 41(10): 1189-1196. https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC201910012.htmCAI Yingfeng, QIN Shunqi, ZHANG Yong, et al. Lateral trajectory tracking control scheme for intelligent vehicle based on extension goodness evaluation[J]. Automotive Engineering, 2019, 41(10): 1189-1196. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC201910012.htm [9] 白成盼, 惠飞, 景首才. 基于微分平坦与MPC的智能车换道控制算法[J]. 计算机技术与发展, 2020, 30(5): 16-20. doi: 10.3969/j.issn.1673-629X.2020.05.004BAI Chengpan, GU Fei, JING Shoucai. Intelligent car lane change control algorithm based on differential flatness and MPC[J]. Computer Technology and Development, 2020, 30(5): 16-20. (in Chinese) doi: 10.3969/j.issn.1673-629X.2020.05.004 [10] 刘洋. 智能车辆高速公路自动变道轨迹规划与控制研究[D]. 长春: 吉林大学, 2019.LIU Yang. Research on the trajectory planning and control for automatic lane change of intelligent vehicles on highway[D]. Changchun: Jilin University, 2019. (in Chinese) [11] 张家旭, 施正堂, 赵健, 等. 基于Radau伪谱法的汽车高速紧急换道避障最优控制策略设计[J]. 汽车工程, 2020, 42 (8): 1040-1049. https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC202008008.htmZHANG Jiaxu, SHI Zhengtang, ZHAO Jian, et al. Optimal control strategy design for vehicle high-speed emergency lane change collision avoidance based on Radau pseudospectral method[J]. Auto-motive Engineering, 2020, 42(8): 1040-1049. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-QCGC202008008.htm [12] 任彧, 赵师涛. 磁导航AGV深度强化学习路径跟踪控制方法[J]. 杭州电子科技大学学报(自然科学版), 2019, 39(2): 28-34. https://www.cnki.com.cn/Article/CJFDTOTAL-HXDY201902006.htmREN Yu, ZHAO Shitao. Deep reinforcement learning based path following control of magnetic navigation AGV[J]. Journal of Hangzhou Dianzi University(Natural Sciences), 2019, 39(2): 28-34. (in Chinese) https://www.cnki.com.cn/Article/CJFDTOTAL-HXDY201902006.htm [13] 赵师涛. 基于强化学习的磁导航AGV控制方法研究[D]. 杭州: 杭州电子科技大学, 2018.ZHAO Shitao. Research on Reinforcement Learning based control method of magnetic navigation AGV[D]. Hangzhou: Hangzhou Dianzi University, 2018. (in Chinese) [14] ANDREAS B, ANASTASIOS M. Straightpath following for underactuated Marine vessels using deep Reinforcement Learning[J]. IFAC-Papers OnLine, 2018, 51(29): 329-334. doi: 10.1016/j.ifacol.2018.09.502 [15] WANG Shuti, YING Xunhe, LI Peng, et al. Trajectory tracking control for mobile robots using reinforcement learning and PID[J]. Iranian Journal of Science and Technology Transations of Electrcal Engineering, 2020, 44(2): 1031-1041. doi: 10.1007/s40998-020-00311-x [16] PACEJKA H B. Tyre and vehicle dynamics[M]. 2nd Ed. Burlington: butter-worth-heinemann, 2006. [17] 龚建伟, 姜岩, 徐威. 无人驾驶车辆模型预测控制[M]. 北京: 北京理工大学出版社, 2014.GONG Jianwei, JIANG Yan, Xu Wei. Model predictive control for self-driving vehicles[M]. Beijing: Beijing Institute of Techno- logy Press. (in Chinese) [18] 理查德·萨顿, 安德鲁·巴图. 强化学习[M]. 2版. 北京: 电子工业出版社, 2019.RICHARD S. Sutton, ANDREW G. Barto. Reinforcement Learning: an introducetion[M]. 2ed. Beijing: Electronic Industry Press, 2019. (in Chinese) [19] 中华人民共和国住房和城乡建设部. 城市快速路设计规程: CJJ 129—2009[S]. 北京: 中国建筑工业出版社, 2009.Ministry of Housing and Urban-Rural Development of the People's Republic of China. Specification for design of urban expressway: CJJ 129—2009[S]. Beijing: China Architecture & Building Press, 2009. (in Chinese)