Although there exists algorithms that give speed advice for cyclists when approaching traffic lights with uncertainty in the timing, they all need to know, and thus assume, the cyclist's response to the advice in order to be able to optimize the advice. To relax this assumption, in this paper an algorithm is proposed that combines reinforcement learning and planning to learn the reaction of cyclist to the advice and deploys this information for planning the best next advice on-the-fly. Rather than a single search procedure, which is conventional in the existing architectures, two sample-based search procedures are suggested to be used in the algorithm. This makes it possible to obtain an accurate local approximation of the action-value function, in spite of the short computation time that is available in each decision epoch. The algorithm is tested in a simulation case study where the impact of a proper initialisation of action-value function as well as the importance of using two search procedures are affirmed.
|Title of host publication||2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019|
|Place of Publication||Piscataway, NJ, USA|
|Publication status||Published - 2019|
|Event||22nd IEEE International Conference on Intelligent Transportation Systems, ITSC 2019 - Auckland, New Zealand|
Duration: 27 Oct 2019 → 30 Oct 2019
|Conference||22nd IEEE International Conference on Intelligent Transportation Systems, ITSC 2019|
|Period||27/10/19 → 30/10/19|