Automatic learning of cyclist's compliance for speed advice at intersections - a reinforcement learning-based approach

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

2 Citations (Scopus)

Abstract

Although there exists algorithms that give speed advice for cyclists when approaching traffic lights with uncertainty in the timing, they all need to know, and thus assume, the cyclist's response to the advice in order to be able to optimize the advice. To relax this assumption, in this paper an algorithm is proposed that combines reinforcement learning and planning to learn the reaction of cyclist to the advice and deploys this information for planning the best next advice on-the-fly. Rather than a single search procedure, which is conventional in the existing architectures, two sample-based search procedures are suggested to be used in the algorithm. This makes it possible to obtain an accurate local approximation of the action-value function, in spite of the short computation time that is available in each decision epoch. The algorithm is tested in a simulation case study where the impact of a proper initialisation of action-value function as well as the importance of using two search procedures are affirmed.

Original languageEnglish
Title of host publication2019 IEEE Intelligent Transportation Systems Conference, ITSC 2019
Place of PublicationPiscataway, NJ, USA
PublisherIEEE
Pages2375-2380
ISBN (Electronic)9781538670248
DOIs
Publication statusPublished - 2019
Event22nd IEEE International Conference on Intelligent Transportation Systems, ITSC 2019 - Auckland, New Zealand
Duration: 27 Oct 201930 Oct 2019
https://www.itsc2019.org/

Conference

Conference22nd IEEE International Conference on Intelligent Transportation Systems, ITSC 2019
Country/TerritoryNew Zealand
CityAuckland
Period27/10/1930/10/19
Internet address

Fingerprint

Dive into the research topics of 'Automatic learning of cyclist's compliance for speed advice at intersections - a reinforcement learning-based approach'. Together they form a unique fingerprint.

Cite this