Comparing Exploration Approaches in Deep Reinforcement Learning for Traffic Light Control

Y. Oren; R.A.N. Starre; F.A. Oliehoek

Comparing Exploration Approaches in Deep Reinforcement Learning for Traffic Light Control

Interactive Intelligence

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

379 Downloads (Pure)

Abstract

Identifying the most efficient exploration approach for deep reinforcement learning in traffic light control is not a trivial task, and can be a critical step in the development of reinforcement learning solutions that can effectively reduce traffic congestion. It is common to use baseline dithering methods such as -greedy. However, the value of more evolved exploration approaches in this setting has not yet been determined. This paper addresses this concern by comparing the performance of the popular deep Q-learning algorithm using one baseline and two state of the art exploration approaches, and their combination. Specifically, -greedy is used as a baseline, and compared to the exploration approaches Bootstrapped DQN, randomized prior functions, and their combination. This is done in three different traffic scenarios, capturing different traffic profiles. The results obtained suggest that the higher the complexity of the traffic scenario, and the larger the size of the observation space of the agent, the larger the gain from efficient exploration. This is illustrated by the improved performance observed in the agents using efficient exploration and enjoying a larger observation space in the complex traffic scenarios.

Original language	English
Title of host publication	BNAIC/BeneLearn 2020
Publisher	RU Leiden
Pages	179-193
Publication status	Published - 2020
Event	BNAIC/BENELEARN 2020 - Leiden, Netherlands Duration: 19 Nov 2020 → 20 Nov 2020

Conference

Conference	BNAIC/BENELEARN 2020
Country/Territory	Netherlands
City	Leiden
Period	19/11/20 → 20/11/20

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

bnaic2020proceedings01Final published version, 9.89 MB

http://bnaic.liacs.leidenuniv.nl/bnaic2020proceedings.pdf

Cite this

@inproceedings{423f0b70f26546bebc582bd8daef7a25,

title = "Comparing Exploration Approaches in Deep Reinforcement Learning for Traffic Light Control",

abstract = "Identifying the most efficient exploration approach for deep reinforcement learning in traffic light control is not a trivial task, and can be a critical step in the development of reinforcement learning solutions that can effectively reduce traffic congestion. It is common to use baseline dithering methods such as -greedy. However, the value of more evolved exploration approaches in this setting has not yet been determined. This paper addresses this concern by comparing the performance of the popular deep Q-learning algorithm using one baseline and two state of the art exploration approaches, and their combination. Specifically, -greedy is used as a baseline, and compared to the exploration approaches Bootstrapped DQN, randomized prior functions, and their combination. This is done in three different traffic scenarios, capturing different traffic profiles. The results obtained suggest that the higher the complexity of the traffic scenario, and the larger the size of the observation space of the agent, the larger the gain from efficient exploration. This is illustrated by the improved performance observed in the agents using efficient exploration and enjoying a larger observation space in the complex traffic scenarios. ",

author = "Y. Oren and R.A.N. Starre and F.A. Oliehoek",

year = "2020",

language = "English",

pages = "179--193",

booktitle = "BNAIC/BeneLearn 2020",

publisher = "RU Leiden",

note = "BNAIC/BENELEARN 2020 ; Conference date: 19-11-2020 Through 20-11-2020",

}

TY - GEN

T1 - Comparing Exploration Approaches in Deep Reinforcement Learning for Traffic Light Control

AU - Oren, Y.

AU - Starre, R.A.N.

AU - Oliehoek, F.A.

PY - 2020

Y1 - 2020

N2 - Identifying the most efficient exploration approach for deep reinforcement learning in traffic light control is not a trivial task, and can be a critical step in the development of reinforcement learning solutions that can effectively reduce traffic congestion. It is common to use baseline dithering methods such as -greedy. However, the value of more evolved exploration approaches in this setting has not yet been determined. This paper addresses this concern by comparing the performance of the popular deep Q-learning algorithm using one baseline and two state of the art exploration approaches, and their combination. Specifically, -greedy is used as a baseline, and compared to the exploration approaches Bootstrapped DQN, randomized prior functions, and their combination. This is done in three different traffic scenarios, capturing different traffic profiles. The results obtained suggest that the higher the complexity of the traffic scenario, and the larger the size of the observation space of the agent, the larger the gain from efficient exploration. This is illustrated by the improved performance observed in the agents using efficient exploration and enjoying a larger observation space in the complex traffic scenarios.

AB - Identifying the most efficient exploration approach for deep reinforcement learning in traffic light control is not a trivial task, and can be a critical step in the development of reinforcement learning solutions that can effectively reduce traffic congestion. It is common to use baseline dithering methods such as -greedy. However, the value of more evolved exploration approaches in this setting has not yet been determined. This paper addresses this concern by comparing the performance of the popular deep Q-learning algorithm using one baseline and two state of the art exploration approaches, and their combination. Specifically, -greedy is used as a baseline, and compared to the exploration approaches Bootstrapped DQN, randomized prior functions, and their combination. This is done in three different traffic scenarios, capturing different traffic profiles. The results obtained suggest that the higher the complexity of the traffic scenario, and the larger the size of the observation space of the agent, the larger the gain from efficient exploration. This is illustrated by the improved performance observed in the agents using efficient exploration and enjoying a larger observation space in the complex traffic scenarios.

M3 - Conference contribution

SP - 179

EP - 193

BT - BNAIC/BeneLearn 2020

PB - RU Leiden

T2 - BNAIC/BENELEARN 2020

Y2 - 19 November 2020 through 20 November 2020

ER -

Comparing Exploration Approaches in Deep Reinforcement Learning for Traffic Light Control

Abstract

Conference

UN SDGs

Access to Document

Fingerprint

Cite this