Resource-constrained Multi-agent Markov Decision Processes

Frits de Nijs

Research output: ThesisDissertation (TU Delft)

123 Downloads (Pure)


Intelligent autonomous agents, designed to automate and simplify many aspects of our society, will increasingly be required to also interact with other agents autonomously. Where agents interact, they are likely to encounter resource constraints. For example, agents managing household appliances to optimize electricity usage might need to share the limited capacity of the distribution grid.
This thesis describes research into new algorithms for optimizing the behavior of agents operating in constrained environments, when these agents have significant uncertainty about the effects of their actions on their state. Such systems are effectively modeled in a framework of constrained multi-agent Markov decision processes (MDPs). A single-agent MDP model captures the uncertainty in the outcome of the actions chosen by a specific agent. It does so by providing a probabilistic model of state transitions, describing the likelihood of arriving in a future state, conditional on the current state and action. Agents collect different rewards or penalties depending on the current state and chosen action, informing their objective of maximizing their expected reward. To include constraints, resource consumption functions are added to the actions, and the agents' (shared) objective is modified with a condition restricting their (cumulative) resource consumption. We propose novel algorithms to advance the state of the art in three challenging settings: computing static preallocations off-line, computing dynamic (re)allocations on-line, and optimally learning model dynamics through safe reinforcement learning under the constraints. Taken together, these algorithms show how agents can coordinate their actions under uncertainty and shared resource constraints in a broad range of conditions. Furthermore, the proposed solutions are complementary: static preallocations can be used as back-up strategy for when a communication disruption prevents the use of dynamic allocations.
Original languageEnglish
QualificationDoctor of Philosophy
Awarding Institution
  • Delft University of Technology
  • de Weerdt, M.M., Supervisor
  • Spaan, M.T.J., Supervisor
Thesis sponsors
Award date4 Apr 2019
Electronic ISBNs978-94-6375-357-9
Publication statusPublished - 21 Feb 2019


  • Decision making under uncertainty
  • Multi-agent systems
  • Optimization
  • Constraint decoupling
  • Reinforcement learning


Dive into the research topics of 'Resource-constrained Multi-agent Markov Decision Processes'. Together they form a unique fingerprint.

Cite this