Probabilistic recursive reasoning for multi-agent reinforcement learning

Ying Wen; Yaodong Yang; Rui Luo; Jun Wang; Wei Pan

Probabilistic recursive reasoning for multi-agent reinforcement learning

Ying Wen, Yaodong Yang, Rui Luo, Jun Wang^*, Wei Pan

^*Corresponding author for this work

Robot Dynamics

Research output: Contribution to conference › Poster › Scientific

49 Downloads (Pure)

Abstract

Humans are capable of attributing latent mental contents such as beliefs, or intentions to others. The social skill is critical in everyday life to reason about the potential consequences of their behaviors so as to plan ahead. It is known that humans use this reasoning ability recursively, i.e. considering what others believe about their own beliefs. In this paper, we start from level-1 recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policy, to which each agent finds the best response and then improve their own policy. We develop decentralized-training-decentralized-execution algorithms, PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenario when there is one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.

Original language	English
Number of pages	1
Publication status	Published - 2019
Event	7th International Conference on Learning Representations, ICLR 2019 - New Orleans, United States Duration: 6 May 2019 → 9 May 2019

Conference

Conference	7th International Conference on Learning Representations, ICLR 2019
Country/Territory	United States
City	New Orleans
Period	6/05/19 → 9/05/19

Access to Document

c95f37cb-f64e-437d-95fc-95da0ab8b7ecFinal published version, 2.38 MB

Cite this

@conference{3aeb90a8e1154074b62c6d8d664f7178,

title = "Probabilistic recursive reasoning for multi-agent reinforcement learning",

abstract = "Humans are capable of attributing latent mental contents such as beliefs, or intentions to others. The social skill is critical in everyday life to reason about the potential consequences of their behaviors so as to plan ahead. It is known that humans use this reasoning ability recursively, i.e. considering what others believe about their own beliefs. In this paper, we start from level-1 recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policy, to which each agent finds the best response and then improve their own policy. We develop decentralized-training-decentralized-execution algorithms, PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenario when there is one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.",

author = "Ying Wen and Yaodong Yang and Rui Luo and Jun Wang and Wei Pan",

year = "2019",

language = "English",

note = "7th International Conference on Learning Representations, ICLR 2019 ; Conference date: 06-05-2019 Through 09-05-2019",

}

TY - CONF

T1 - Probabilistic recursive reasoning for multi-agent reinforcement learning

AU - Wen, Ying

AU - Yang, Yaodong

AU - Luo, Rui

AU - Wang, Jun

AU - Pan, Wei

PY - 2019

Y1 - 2019

N2 - Humans are capable of attributing latent mental contents such as beliefs, or intentions to others. The social skill is critical in everyday life to reason about the potential consequences of their behaviors so as to plan ahead. It is known that humans use this reasoning ability recursively, i.e. considering what others believe about their own beliefs. In this paper, we start from level-1 recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policy, to which each agent finds the best response and then improve their own policy. We develop decentralized-training-decentralized-execution algorithms, PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenario when there is one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.

AB - Humans are capable of attributing latent mental contents such as beliefs, or intentions to others. The social skill is critical in everyday life to reason about the potential consequences of their behaviors so as to plan ahead. It is known that humans use this reasoning ability recursively, i.e. considering what others believe about their own beliefs. In this paper, we start from level-1 recursion and introduce a probabilistic recursive reasoning (PR2) framework for multi-agent reinforcement learning. Our hypothesis is that it is beneficial for each agent to account for how the opponents would react to its future behaviors. Under the PR2 framework, we adopt variational Bayes methods to approximate the opponents' conditional policy, to which each agent finds the best response and then improve their own policy. We develop decentralized-training-decentralized-execution algorithms, PR2-Q and PR2-Actor-Critic, that are proved to converge in the self-play scenario when there is one Nash equilibrium. Our methods are tested on both the matrix game and the differential game, which have a non-trivial equilibrium where common gradient-based methods fail to converge. Our experiments show that it is critical to reason about how the opponents believe about what the agent believes. We expect our work to contribute a new idea of modeling the opponents to the multi-agent reinforcement learning community.

UR - http://www.scopus.com/inward/record.url?scp=85071148528&partnerID=8YFLogxK

M3 - Poster

T2 - 7th International Conference on Learning Representations, ICLR 2019

Y2 - 6 May 2019 through 9 May 2019

ER -

Probabilistic recursive reasoning for multi-agent reinforcement learning

Abstract

Conference

Access to Document

Other files and links

Fingerprint

Cite this