Interaction-Aware Motion Planning in Crowded Dynamic Environments

B.F. Ferreira de Brito

doi:10.4233/uuid:0bc3677d-dc15-4f93-a079-8d600e967c5a

Interaction-Aware Motion Planning in Crowded Dynamic Environments

B.F. Ferreira de Brito

Learning & Autonomous Control

Research output: Thesis › Dissertation (TU Delft)

189 Downloads (Pure)

Abstract

Autonomous robots will profoundly impact our society, making our roads safer, reducing labor costs and carbon dioxide (CO2) emissions, and improving our life quality. However, to make that happen, robots need to navigate among humans, which is extremely difficult. Firstly, humans do not explicitly communicate their intentions and use intuition to reason about others' plans to avoid collisions. Secondly, humans exploit interactions to navigate efficiently in cluttered environments. Traditional motion planning methods for autonomous navigation in human environments use geometry, physics, topologies, and handcrafted functions to account for interaction but only plan one step. In contrast, trajectory optimization methods allow planning over a prediction horizon accounting for the environment evolution. Yet, these methods scale poorly with the number of agents and assume structured scenarios with a limited number of interacting agents. Learning-based approaches overcome the latter by learning a policy's parameters offline, e.g., from data or simulation. However, to date, learned policies show poor performance and unpredictable behavior when employed in reality as the conditions differ from the learning environment. Moreover, learning-based approaches do not guarantee collision avoidance or feasibility with respect to the robot dynamics. Therefore, this thesis aims to develop motion planning algorithms generating online predictive and interaction-aware motion plans to enable robots' safe and efficient navigation among humans.

The first main contribution of this thesis is a predictive motion planning algorithm for autonomous robot navigation in unstructured environments populated with pedestrians. The proposed method builds on nonlinear model-predictive contouring control proposing a local formulation (LMPCC) to generate predictive motion plans in real-time. Static collision avoidance is achieved by constraining the robot's positions to stay within a set of convex regions approximating the surrounding free space computed from a static map. Moreover, an upper bound for the Minkowski sum of a circle and an ellipse is proposed and used as an inequality constraint to ensure dynamic collision avoidance, assuming the robot's space as a circle and the dynamic obstacles', for instance pedestrians, space ellipsoid. The LMPCC approach is analyzed and compared against a reactive and a learning-based approach in simulation. Experimentally, the method is tested fully onboard on a mobile robot platform (Clearpath Jackal) and in an autonomous car (Toyota Prius).

In real scenarios, pedestrians do not explicitly communicate their intentions, and therefore, LMPCC uses a constant velocity (CV) model to estimate their future trajectories. However, CV predictions ignore the environment constraints, e.g., static obstacles, the interaction between agents, and the inherent uncertainty and multimodality of the pedestrians' motion. Hence, this thesis presents a variational recurrent neural network architecture (Social-VRNN) for interaction-aware and multi-modal trajectory predictions. The Social-VRNN fuses information of the pedestrian's dynamics, static obstacles, and surrounding pedestrians and outputs the parameters of a Gaussian Mixture Model (GMM). A variational Bayesian learning approach is employed to learn the model's parameters minimizing the evidence lower bound (ELBO). Experimental results on real and simulation data are presented, showing that our model can effectively learn to predict multiple trajectories capturing the different courses that a pedestrian may follow.

Enhancing the LMPCC method with interaction-aware predictions is insufficient to enable safe and efficient autonomous navigation in cluttered environments. The LMPCC is a local trajectory optimization method and considers a limited planning horizon to enable online motion planning. Consequently, LMPCC plans can be locally optimal, which may result in catastrophic failures, such as deadlocks and collisions, in the long term. To overcome the latter, global guidance, e.g., cost-to-go heuristics, to the optimization problem is an option. In contrast to optimization-based methods, learning-based methods, i.e., deep reinforcement learning (DRL), allow learning policies to optimize long-term rewards in an offline training phase. Therefore, this thesis introduces two novel frameworks enhancing state-of-art online optimization-based planners with learned global guidance policies applied to mobile robot navigation in cluttered environments and autonomous vehicles driving in dense traffic.

Firstly, the Goal-Oriented Model Predictive Controller (GO-MPC) is introduced, tackling the problem that the robot’s global goal is often located far beyond the planning horizon resulting in locally optimal motion plans. The framework proposes to use DRL to learn an interaction-aware policy providing the next optimal subgoal position to an MPC planner. The recommended subgoal helps the robot progress towards its end goal and accounts for the expected interaction with other agents. Based on the recommended subgoal, the MPC planner then optimizes the inputs for the robot, satisfying its kinodynamic and collision avoidance constraints. Simulation results are presented demonstrating that GO-MPC enhances the navigation performance in terms of safety and efficiency, i.e., travel time, compared to solely based MPC and deep RL frameworks in mixed settings, i.e., with cooperative and non-cooperative agents, and multi-robot scenarios.

Secondly, the interactive Model Predictive Controller (IntMPC) for safe navigation in dense traffic scenarios is presented. While GO-MPC learns a subgoal policy, the IntMPC learns a velocity reference policy exploring the connection between human driving behavior and their velocity changes when interacting. Hence, the IntMPC approach learns, via deep Reinforcement Learning (RL), an interaction-aware policy providing global guidance as a velocity reference allowing to control and exploit the interaction with the other vehicles. Simulation results are presented demonstrating that the learned policy can reason about the cooperativeness of other vehicles and enable the local planner with interactive behavior to pro-actively merge in dense traffic while remaining safe in case the other vehicles do not yield.

Overall, this thesis contributes to enhancing autonomous robots with predictive behavior, with the ability to infer the others' trajectories and operate in cluttered environments.

However, two important limitations remain to be solved: the proposed motion planner computes open-loop interaction-aware motion plans and does not account for interaction in closed-loop. Moreover, the prediction model and guidance policies rely solely on offline learning. Future works may investigate how to account for interaction in the planning stage and how online data streams can be used to improve the navigation algorithm's performance over time.

Original language	English
Awarding Institution	Delft University of Technology
Supervisors/Advisors	Alonso-Mora, Javier, Supervisor Babuska, R., Supervisor
Award date	6 Oct 2022
Print ISBNs	978-94-6366-605-3
DOIs	https://doi.org/10.4233/uuid:0bc3677d-dc15-4f93-a079-8d600e967c5a
Publication status	Published - 2022

Keywords

Motion planning
Machine learning
Interaction
Collision avoidance
Autonomous Vehicles
Navigation among humans
Decision-making
Dynamic environments
Reinforcement Learning
Trajectory prediction
mobile robots

Access to Document

10.4233/uuid:0bc3677d-dc15-4f93-a079-8d600e967c5a

BrunoBrito_dissertationFinal published version, 19.1 MB

Cite this

@phdthesis{0bc3677ddc154f93a0798d600e967c5a,

title = "Interaction-Aware Motion Planning in Crowded Dynamic Environments",

abstract = "Autonomous robots will profoundly impact our society, making our roads safer, reducing labor costs and carbon dioxide (CO2) emissions, and improving our life quality. However, to make that happen, robots need to navigate among humans, which is extremely difficult. Firstly, humans do not explicitly communicate their intentions and use intuition to reason about others' plans to avoid collisions. Secondly, humans exploit interactions to navigate efficiently in cluttered environments. Traditional motion planning methods for autonomous navigation in human environments use geometry, physics, topologies, and handcrafted functions to account for interaction but only plan one step. In contrast, trajectory optimization methods allow planning over a prediction horizon accounting for the environment evolution. Yet, these methods scale poorly with the number of agents and assume structured scenarios with a limited number of interacting agents. Learning-based approaches overcome the latter by learning a policy's parameters offline, e.g., from data or simulation. However, to date, learned policies show poor performance and unpredictable behavior when employed in reality as the conditions differ from the learning environment. Moreover, learning-based approaches do not guarantee collision avoidance or feasibility with respect to the robot dynamics. Therefore, this thesis aims to develop motion planning algorithms generating online predictive and interaction-aware motion plans to enable robots' safe and efficient navigation among humans. The first main contribution of this thesis is a predictive motion planning algorithm for autonomous robot navigation in unstructured environments populated with pedestrians. The proposed method builds on nonlinear model-predictive contouring control proposing a local formulation (LMPCC) to generate predictive motion plans in real-time. Static collision avoidance is achieved by constraining the robot's positions to stay within a set of convex regions approximating the surrounding free space computed from a static map. Moreover, an upper bound for the Minkowski sum of a circle and an ellipse is proposed and used as an inequality constraint to ensure dynamic collision avoidance, assuming the robot's space as a circle and the dynamic obstacles', for instance pedestrians, space ellipsoid. The LMPCC approach is analyzed and compared against a reactive and a learning-based approach in simulation. Experimentally, the method is tested fully onboard on a mobile robot platform (Clearpath Jackal) and in an autonomous car (Toyota Prius).In real scenarios, pedestrians do not explicitly communicate their intentions, and therefore, LMPCC uses a constant velocity (CV) model to estimate their future trajectories. However, CV predictions ignore the environment constraints, e.g., static obstacles, the interaction between agents, and the inherent uncertainty and multimodality of the pedestrians' motion. Hence, this thesis presents a variational recurrent neural network architecture (Social-VRNN) for interaction-aware and multi-modal trajectory predictions. The Social-VRNN fuses information of the pedestrian's dynamics, static obstacles, and surrounding pedestrians and outputs the parameters of a Gaussian Mixture Model (GMM). A variational Bayesian learning approach is employed to learn the model's parameters minimizing the evidence lower bound (ELBO). Experimental results on real and simulation data are presented, showing that our model can effectively learn to predict multiple trajectories capturing the different courses that a pedestrian may follow.Enhancing the LMPCC method with interaction-aware predictions is insufficient to enable safe and efficient autonomous navigation in cluttered environments. The LMPCC is a local trajectory optimization method and considers a limited planning horizon to enable online motion planning. Consequently, LMPCC plans can be locally optimal, which may result in catastrophic failures, such as deadlocks and collisions, in the long term. To overcome the latter, global guidance, e.g., cost-to-go heuristics, to the optimization problem is an option. In contrast to optimization-based methods, learning-based methods, i.e., deep reinforcement learning (DRL), allow learning policies to optimize long-term rewards in an offline training phase. Therefore, this thesis introduces two novel frameworks enhancing state-of-art online optimization-based planners with learned global guidance policies applied to mobile robot navigation in cluttered environments and autonomous vehicles driving in dense traffic. Firstly, the Goal-Oriented Model Predictive Controller (GO-MPC) is introduced, tackling the problem that the robot{\textquoteright}s global goal is often located far beyond the planning horizon resulting in locally optimal motion plans. The framework proposes to use DRL to learn an interaction-aware policy providing the next optimal subgoal position to an MPC planner. The recommended subgoal helps the robot progress towards its end goal and accounts for the expected interaction with other agents. Based on the recommended subgoal, the MPC planner then optimizes the inputs for the robot, satisfying its kinodynamic and collision avoidance constraints. Simulation results are presented demonstrating that GO-MPC enhances the navigation performance in terms of safety and efficiency, i.e., travel time, compared to solely based MPC and deep RL frameworks in mixed settings, i.e., with cooperative and non-cooperative agents, and multi-robot scenarios.Secondly, the interactive Model Predictive Controller (IntMPC) for safe navigation in dense traffic scenarios is presented. While GO-MPC learns a subgoal policy, the IntMPC learns a velocity reference policy exploring the connection between human driving behavior and their velocity changes when interacting. Hence, the IntMPC approach learns, via deep Reinforcement Learning (RL), an interaction-aware policy providing global guidance as a velocity reference allowing to control and exploit the interaction with the other vehicles. Simulation results are presented demonstrating that the learned policy can reason about the cooperativeness of other vehicles and enable the local planner with interactive behavior to pro-actively merge in dense traffic while remaining safe in case the other vehicles do not yield. Overall, this thesis contributes to enhancing autonomous robots with predictive behavior, with the ability to infer the others' trajectories and operate in cluttered environments.However, two important limitations remain to be solved: the proposed motion planner computes open-loop interaction-aware motion plans and does not account for interaction in closed-loop. Moreover, the prediction model and guidance policies rely solely on offline learning. Future works may investigate how to account for interaction in the planning stage and how online data streams can be used to improve the navigation algorithm's performance over time.",

keywords = "Motion planning, Machine learning, Interaction, Collision avoidance, Autonomous Vehicles, Navigation among humans, Decision-making, Dynamic environments, Reinforcement Learning, Trajectory prediction, mobile robots",

author = "{Ferreira de Brito}, B.F.",

year = "2022",

doi = "10.4233/uuid:0bc3677d-dc15-4f93-a079-8d600e967c5a",

language = "English",

isbn = "978-94-6366-605-3",

type = "Dissertation (TU Delft)",

school = "Delft University of Technology",

}

TY - THES

T1 - Interaction-Aware Motion Planning in Crowded Dynamic Environments

AU - Ferreira de Brito, B.F.

PY - 2022

Y1 - 2022

N2 - Autonomous robots will profoundly impact our society, making our roads safer, reducing labor costs and carbon dioxide (CO2) emissions, and improving our life quality. However, to make that happen, robots need to navigate among humans, which is extremely difficult. Firstly, humans do not explicitly communicate their intentions and use intuition to reason about others' plans to avoid collisions. Secondly, humans exploit interactions to navigate efficiently in cluttered environments. Traditional motion planning methods for autonomous navigation in human environments use geometry, physics, topologies, and handcrafted functions to account for interaction but only plan one step. In contrast, trajectory optimization methods allow planning over a prediction horizon accounting for the environment evolution. Yet, these methods scale poorly with the number of agents and assume structured scenarios with a limited number of interacting agents. Learning-based approaches overcome the latter by learning a policy's parameters offline, e.g., from data or simulation. However, to date, learned policies show poor performance and unpredictable behavior when employed in reality as the conditions differ from the learning environment. Moreover, learning-based approaches do not guarantee collision avoidance or feasibility with respect to the robot dynamics. Therefore, this thesis aims to develop motion planning algorithms generating online predictive and interaction-aware motion plans to enable robots' safe and efficient navigation among humans. The first main contribution of this thesis is a predictive motion planning algorithm for autonomous robot navigation in unstructured environments populated with pedestrians. The proposed method builds on nonlinear model-predictive contouring control proposing a local formulation (LMPCC) to generate predictive motion plans in real-time. Static collision avoidance is achieved by constraining the robot's positions to stay within a set of convex regions approximating the surrounding free space computed from a static map. Moreover, an upper bound for the Minkowski sum of a circle and an ellipse is proposed and used as an inequality constraint to ensure dynamic collision avoidance, assuming the robot's space as a circle and the dynamic obstacles', for instance pedestrians, space ellipsoid. The LMPCC approach is analyzed and compared against a reactive and a learning-based approach in simulation. Experimentally, the method is tested fully onboard on a mobile robot platform (Clearpath Jackal) and in an autonomous car (Toyota Prius).In real scenarios, pedestrians do not explicitly communicate their intentions, and therefore, LMPCC uses a constant velocity (CV) model to estimate their future trajectories. However, CV predictions ignore the environment constraints, e.g., static obstacles, the interaction between agents, and the inherent uncertainty and multimodality of the pedestrians' motion. Hence, this thesis presents a variational recurrent neural network architecture (Social-VRNN) for interaction-aware and multi-modal trajectory predictions. The Social-VRNN fuses information of the pedestrian's dynamics, static obstacles, and surrounding pedestrians and outputs the parameters of a Gaussian Mixture Model (GMM). A variational Bayesian learning approach is employed to learn the model's parameters minimizing the evidence lower bound (ELBO). Experimental results on real and simulation data are presented, showing that our model can effectively learn to predict multiple trajectories capturing the different courses that a pedestrian may follow.Enhancing the LMPCC method with interaction-aware predictions is insufficient to enable safe and efficient autonomous navigation in cluttered environments. The LMPCC is a local trajectory optimization method and considers a limited planning horizon to enable online motion planning. Consequently, LMPCC plans can be locally optimal, which may result in catastrophic failures, such as deadlocks and collisions, in the long term. To overcome the latter, global guidance, e.g., cost-to-go heuristics, to the optimization problem is an option. In contrast to optimization-based methods, learning-based methods, i.e., deep reinforcement learning (DRL), allow learning policies to optimize long-term rewards in an offline training phase. Therefore, this thesis introduces two novel frameworks enhancing state-of-art online optimization-based planners with learned global guidance policies applied to mobile robot navigation in cluttered environments and autonomous vehicles driving in dense traffic. Firstly, the Goal-Oriented Model Predictive Controller (GO-MPC) is introduced, tackling the problem that the robot’s global goal is often located far beyond the planning horizon resulting in locally optimal motion plans. The framework proposes to use DRL to learn an interaction-aware policy providing the next optimal subgoal position to an MPC planner. The recommended subgoal helps the robot progress towards its end goal and accounts for the expected interaction with other agents. Based on the recommended subgoal, the MPC planner then optimizes the inputs for the robot, satisfying its kinodynamic and collision avoidance constraints. Simulation results are presented demonstrating that GO-MPC enhances the navigation performance in terms of safety and efficiency, i.e., travel time, compared to solely based MPC and deep RL frameworks in mixed settings, i.e., with cooperative and non-cooperative agents, and multi-robot scenarios.Secondly, the interactive Model Predictive Controller (IntMPC) for safe navigation in dense traffic scenarios is presented. While GO-MPC learns a subgoal policy, the IntMPC learns a velocity reference policy exploring the connection between human driving behavior and their velocity changes when interacting. Hence, the IntMPC approach learns, via deep Reinforcement Learning (RL), an interaction-aware policy providing global guidance as a velocity reference allowing to control and exploit the interaction with the other vehicles. Simulation results are presented demonstrating that the learned policy can reason about the cooperativeness of other vehicles and enable the local planner with interactive behavior to pro-actively merge in dense traffic while remaining safe in case the other vehicles do not yield. Overall, this thesis contributes to enhancing autonomous robots with predictive behavior, with the ability to infer the others' trajectories and operate in cluttered environments.However, two important limitations remain to be solved: the proposed motion planner computes open-loop interaction-aware motion plans and does not account for interaction in closed-loop. Moreover, the prediction model and guidance policies rely solely on offline learning. Future works may investigate how to account for interaction in the planning stage and how online data streams can be used to improve the navigation algorithm's performance over time.

AB - Autonomous robots will profoundly impact our society, making our roads safer, reducing labor costs and carbon dioxide (CO2) emissions, and improving our life quality. However, to make that happen, robots need to navigate among humans, which is extremely difficult. Firstly, humans do not explicitly communicate their intentions and use intuition to reason about others' plans to avoid collisions. Secondly, humans exploit interactions to navigate efficiently in cluttered environments. Traditional motion planning methods for autonomous navigation in human environments use geometry, physics, topologies, and handcrafted functions to account for interaction but only plan one step. In contrast, trajectory optimization methods allow planning over a prediction horizon accounting for the environment evolution. Yet, these methods scale poorly with the number of agents and assume structured scenarios with a limited number of interacting agents. Learning-based approaches overcome the latter by learning a policy's parameters offline, e.g., from data or simulation. However, to date, learned policies show poor performance and unpredictable behavior when employed in reality as the conditions differ from the learning environment. Moreover, learning-based approaches do not guarantee collision avoidance or feasibility with respect to the robot dynamics. Therefore, this thesis aims to develop motion planning algorithms generating online predictive and interaction-aware motion plans to enable robots' safe and efficient navigation among humans. The first main contribution of this thesis is a predictive motion planning algorithm for autonomous robot navigation in unstructured environments populated with pedestrians. The proposed method builds on nonlinear model-predictive contouring control proposing a local formulation (LMPCC) to generate predictive motion plans in real-time. Static collision avoidance is achieved by constraining the robot's positions to stay within a set of convex regions approximating the surrounding free space computed from a static map. Moreover, an upper bound for the Minkowski sum of a circle and an ellipse is proposed and used as an inequality constraint to ensure dynamic collision avoidance, assuming the robot's space as a circle and the dynamic obstacles', for instance pedestrians, space ellipsoid. The LMPCC approach is analyzed and compared against a reactive and a learning-based approach in simulation. Experimentally, the method is tested fully onboard on a mobile robot platform (Clearpath Jackal) and in an autonomous car (Toyota Prius).In real scenarios, pedestrians do not explicitly communicate their intentions, and therefore, LMPCC uses a constant velocity (CV) model to estimate their future trajectories. However, CV predictions ignore the environment constraints, e.g., static obstacles, the interaction between agents, and the inherent uncertainty and multimodality of the pedestrians' motion. Hence, this thesis presents a variational recurrent neural network architecture (Social-VRNN) for interaction-aware and multi-modal trajectory predictions. The Social-VRNN fuses information of the pedestrian's dynamics, static obstacles, and surrounding pedestrians and outputs the parameters of a Gaussian Mixture Model (GMM). A variational Bayesian learning approach is employed to learn the model's parameters minimizing the evidence lower bound (ELBO). Experimental results on real and simulation data are presented, showing that our model can effectively learn to predict multiple trajectories capturing the different courses that a pedestrian may follow.Enhancing the LMPCC method with interaction-aware predictions is insufficient to enable safe and efficient autonomous navigation in cluttered environments. The LMPCC is a local trajectory optimization method and considers a limited planning horizon to enable online motion planning. Consequently, LMPCC plans can be locally optimal, which may result in catastrophic failures, such as deadlocks and collisions, in the long term. To overcome the latter, global guidance, e.g., cost-to-go heuristics, to the optimization problem is an option. In contrast to optimization-based methods, learning-based methods, i.e., deep reinforcement learning (DRL), allow learning policies to optimize long-term rewards in an offline training phase. Therefore, this thesis introduces two novel frameworks enhancing state-of-art online optimization-based planners with learned global guidance policies applied to mobile robot navigation in cluttered environments and autonomous vehicles driving in dense traffic. Firstly, the Goal-Oriented Model Predictive Controller (GO-MPC) is introduced, tackling the problem that the robot’s global goal is often located far beyond the planning horizon resulting in locally optimal motion plans. The framework proposes to use DRL to learn an interaction-aware policy providing the next optimal subgoal position to an MPC planner. The recommended subgoal helps the robot progress towards its end goal and accounts for the expected interaction with other agents. Based on the recommended subgoal, the MPC planner then optimizes the inputs for the robot, satisfying its kinodynamic and collision avoidance constraints. Simulation results are presented demonstrating that GO-MPC enhances the navigation performance in terms of safety and efficiency, i.e., travel time, compared to solely based MPC and deep RL frameworks in mixed settings, i.e., with cooperative and non-cooperative agents, and multi-robot scenarios.Secondly, the interactive Model Predictive Controller (IntMPC) for safe navigation in dense traffic scenarios is presented. While GO-MPC learns a subgoal policy, the IntMPC learns a velocity reference policy exploring the connection between human driving behavior and their velocity changes when interacting. Hence, the IntMPC approach learns, via deep Reinforcement Learning (RL), an interaction-aware policy providing global guidance as a velocity reference allowing to control and exploit the interaction with the other vehicles. Simulation results are presented demonstrating that the learned policy can reason about the cooperativeness of other vehicles and enable the local planner with interactive behavior to pro-actively merge in dense traffic while remaining safe in case the other vehicles do not yield. Overall, this thesis contributes to enhancing autonomous robots with predictive behavior, with the ability to infer the others' trajectories and operate in cluttered environments.However, two important limitations remain to be solved: the proposed motion planner computes open-loop interaction-aware motion plans and does not account for interaction in closed-loop. Moreover, the prediction model and guidance policies rely solely on offline learning. Future works may investigate how to account for interaction in the planning stage and how online data streams can be used to improve the navigation algorithm's performance over time.

KW - Motion planning

KW - Machine learning

KW - Interaction

KW - Collision avoidance

KW - Autonomous Vehicles

KW - Navigation among humans

KW - Decision-making

KW - Dynamic environments

KW - Reinforcement Learning

KW - Trajectory prediction

KW - mobile robots

U2 - 10.4233/uuid:0bc3677d-dc15-4f93-a079-8d600e967c5a

DO - 10.4233/uuid:0bc3677d-dc15-4f93-a079-8d600e967c5a

M3 - Dissertation (TU Delft)

SN - 978-94-6366-605-3

ER -

Interaction-Aware Motion Planning in Crowded Dynamic Environments

Abstract

Keywords

Access to Document

Fingerprint

Cite this