Global synchromodal shipment matching problem with dynamic and stochastic travel times: a reinforcement learning approach

W. Guo; B. Atasoy; R. R. Negenborn

doi:10.1007/s10479-021-04489-z

Global synchromodal shipment matching problem with dynamic and stochastic travel times: a reinforcement learning approach

W. Guo^*, B. Atasoy, R. R. Negenborn

^*Corresponding author for this work

Transport Engineering and Logistics

Research output: Contribution to journal › Article › Scientific › peer-review

9 Citations (Scopus)

45 Downloads (Pure)

Abstract

Global synchromodal transportation involves the movement of container shipments between inland terminals located in different continents using ships, barges, trains, trucks, or any combination among them through integrated planning at a network level. One of the challenges faced by global operators is the matching of accepted shipments with services in an integrated global synchromodal transport network with dynamic and stochastic travel times. The travel times of services are unknown and revealed dynamically during the execution of transport plans, but the stochastic information of travel times are assumed available. Matching decisions can be updated before shipments arrive at their destination terminals. The objective of the problem is to maximize the total profits that are expressed in terms of a combination of revenues, travel costs, transfer costs, storage costs, delay costs, and carbon tax over a given planning horizon. We propose a sequential decision process model to describe the problem. In order to address the curse of dimensionality, we develop a reinforcement learning approach to learn the value of matching a shipment with a service through simulations. Specifically, we adopt the Q-learning algorithm to update value function estimations and use the ϵ-greedy strategy to balance exploitation and exploration. Online decisions are created based on the estimated value functions. The performance of the reinforcement learning approach is evaluated in comparison to a myopic approach that does not consider uncertainties and a stochastic approach that sets chance constraints on feasible transshipment under a rolling horizon framework.

Original language	English
Number of pages	32
Journal	Annals of Operations Research
DOIs	https://doi.org/10.1007/s10479-021-04489-z
Publication status	Published - 2022

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Dynamic and stochastic travel times
Global synchromodal shipment matching
Q-learning
Reinforcement learning
Sequential decision process

UN SDGs

This output contributes to the following UN Sustainable Development Goals (SDGs)

Access to Document

10.1007/s10479-021-04489-z

Guo2022_Article_GlobalSynchromodalShipmentMatcFinal published version, 1.73 MB

Cite this

@article{c376a29a5e29464c8fecaddc1a1c05d2,

title = "Global synchromodal shipment matching problem with dynamic and stochastic travel times: a reinforcement learning approach",

abstract = "Global synchromodal transportation involves the movement of container shipments between inland terminals located in different continents using ships, barges, trains, trucks, or any combination among them through integrated planning at a network level. One of the challenges faced by global operators is the matching of accepted shipments with services in an integrated global synchromodal transport network with dynamic and stochastic travel times. The travel times of services are unknown and revealed dynamically during the execution of transport plans, but the stochastic information of travel times are assumed available. Matching decisions can be updated before shipments arrive at their destination terminals. The objective of the problem is to maximize the total profits that are expressed in terms of a combination of revenues, travel costs, transfer costs, storage costs, delay costs, and carbon tax over a given planning horizon. We propose a sequential decision process model to describe the problem. In order to address the curse of dimensionality, we develop a reinforcement learning approach to learn the value of matching a shipment with a service through simulations. Specifically, we adopt the Q-learning algorithm to update value function estimations and use the ϵ-greedy strategy to balance exploitation and exploration. Online decisions are created based on the estimated value functions. The performance of the reinforcement learning approach is evaluated in comparison to a myopic approach that does not consider uncertainties and a stochastic approach that sets chance constraints on feasible transshipment under a rolling horizon framework.",

keywords = "Dynamic and stochastic travel times, Global synchromodal shipment matching, Q-learning, Reinforcement learning, Sequential decision process",

author = "W. Guo and B. Atasoy and Negenborn, {R. R.}",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.",

year = "2022",

doi = "10.1007/s10479-021-04489-z",

language = "English",

journal = "Annals of Operations Research",

issn = "0254-5330",

publisher = "Springer",

}

TY - JOUR

T1 - Global synchromodal shipment matching problem with dynamic and stochastic travel times

T2 - a reinforcement learning approach

AU - Guo, W.

AU - Atasoy, B.

AU - Negenborn, R. R.

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2022

Y1 - 2022

N2 - Global synchromodal transportation involves the movement of container shipments between inland terminals located in different continents using ships, barges, trains, trucks, or any combination among them through integrated planning at a network level. One of the challenges faced by global operators is the matching of accepted shipments with services in an integrated global synchromodal transport network with dynamic and stochastic travel times. The travel times of services are unknown and revealed dynamically during the execution of transport plans, but the stochastic information of travel times are assumed available. Matching decisions can be updated before shipments arrive at their destination terminals. The objective of the problem is to maximize the total profits that are expressed in terms of a combination of revenues, travel costs, transfer costs, storage costs, delay costs, and carbon tax over a given planning horizon. We propose a sequential decision process model to describe the problem. In order to address the curse of dimensionality, we develop a reinforcement learning approach to learn the value of matching a shipment with a service through simulations. Specifically, we adopt the Q-learning algorithm to update value function estimations and use the ϵ-greedy strategy to balance exploitation and exploration. Online decisions are created based on the estimated value functions. The performance of the reinforcement learning approach is evaluated in comparison to a myopic approach that does not consider uncertainties and a stochastic approach that sets chance constraints on feasible transshipment under a rolling horizon framework.

AB - Global synchromodal transportation involves the movement of container shipments between inland terminals located in different continents using ships, barges, trains, trucks, or any combination among them through integrated planning at a network level. One of the challenges faced by global operators is the matching of accepted shipments with services in an integrated global synchromodal transport network with dynamic and stochastic travel times. The travel times of services are unknown and revealed dynamically during the execution of transport plans, but the stochastic information of travel times are assumed available. Matching decisions can be updated before shipments arrive at their destination terminals. The objective of the problem is to maximize the total profits that are expressed in terms of a combination of revenues, travel costs, transfer costs, storage costs, delay costs, and carbon tax over a given planning horizon. We propose a sequential decision process model to describe the problem. In order to address the curse of dimensionality, we develop a reinforcement learning approach to learn the value of matching a shipment with a service through simulations. Specifically, we adopt the Q-learning algorithm to update value function estimations and use the ϵ-greedy strategy to balance exploitation and exploration. Online decisions are created based on the estimated value functions. The performance of the reinforcement learning approach is evaluated in comparison to a myopic approach that does not consider uncertainties and a stochastic approach that sets chance constraints on feasible transshipment under a rolling horizon framework.

KW - Dynamic and stochastic travel times

KW - Global synchromodal shipment matching

KW - Q-learning

KW - Reinforcement learning

KW - Sequential decision process

UR - http://www.scopus.com/inward/record.url?scp=85123263562&partnerID=8YFLogxK

U2 - 10.1007/s10479-021-04489-z

DO - 10.1007/s10479-021-04489-z

M3 - Article

AN - SCOPUS:85123263562

SN - 0254-5330

JO - Annals of Operations Research

JF - Annals of Operations Research

ER -

Global synchromodal shipment matching problem with dynamic and stochastic travel times: a reinforcement learning approach

Abstract

Bibliographical note

Keywords

UN SDGs

Access to Document

Other files and links

Fingerprint

Cite this