Abstract
Pre-training is a process used to enhance the learning of deep reinforcement learning (RL)
algorithms through initial guidance from an expert demonstrator. This involves training a neural
network to replicate the outputs of the selected expert before allowing the RL agent to specialise and develop its own policy. This paper outlines a study that aims to analyse the impact of pre-training on deep RL algorithms used in ramp metering. Specifically, behaviour cloning is performed for increasing lengths of time (0-10,000 epochs), with ALINEA as the chosen expert algorithm guiding a proposed Proximal Policy Optimisation (PPO)-based system. The results confirm that, with the same length of training, some initial guidance through pre-training can significantly improve the
system’s effectiveness in reducing congestion compared to no pre-training. Otherwise, excessive
pre-training may lead to overfitting and reduced generalisability. Design issues resulting in weak
model convergence, however, limit the algorithm’s overall performance in the chosen scenario.
algorithms through initial guidance from an expert demonstrator. This involves training a neural
network to replicate the outputs of the selected expert before allowing the RL agent to specialise and develop its own policy. This paper outlines a study that aims to analyse the impact of pre-training on deep RL algorithms used in ramp metering. Specifically, behaviour cloning is performed for increasing lengths of time (0-10,000 epochs), with ALINEA as the chosen expert algorithm guiding a proposed Proximal Policy Optimisation (PPO)-based system. The results confirm that, with the same length of training, some initial guidance through pre-training can significantly improve the
system’s effectiveness in reducing congestion compared to no pre-training. Otherwise, excessive
pre-training may lead to overfitting and reduced generalisability. Design issues resulting in weak
model convergence, however, limit the algorithm’s overall performance in the chosen scenario.
Original language | English |
---|---|
Title of host publication | Proceedings of the National Academy of Science’s Transportation Research Board 104th Annual Meeting |
Publication status | Unpublished - 7 Jan 2025 |
Event | 104th Annual Meeting of the Transportation Research Board (TRB) - Walter E. Washington Convention Center, Washington DC, United States Duration: 5 Jan 2025 → 9 Jan 2025 https://trb-annual-meeting.nationalacademies.org/schedule |
Conference
Conference | 104th Annual Meeting of the Transportation Research Board (TRB) |
---|---|
Abbreviated title | TRB 2025 |
Country/Territory | United States |
City | Washington DC |
Period | 5/01/25 → 9/01/25 |
Internet address |
Keywords
- Network management
- Road traffic control
- Ramp metering
- Reinforcement learning
Country (case study)
- Netherlands
Fingerprint
Dive into the research topics of 'Impact of Pre-training on Deep Reinforcement Learning Ramp Metering Systems'. Together they form a unique fingerprint.Datasets
-
TUD-SUMO
Evans, C. (Creator), GitHub, 16 Jul 2024
https://github.com/DAIMoNDLab/tud-sumo and one more link, https://pypi.org/project/tud-sumo/ (show fewer)
Dataset/Software: Software