Abstraction-Guided Policy Recovery from Expert Demonstrations

C.T. Ponnambalam; F.A. Oliehoek; M.T.J. Spaan

Abstraction-Guided Policy Recovery from Expert Demonstrations

C.T. Ponnambalam, F.A. Oliehoek, M.T.J. Spaan

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

35 Downloads (Pure)

Abstract

Behavior cloning is a method of automated decision-making that aims to extract meaningful information from expert demonstrations and reproduce the same behavior autonomously. It is unlikely that demonstrations will exhaustively cover the potential problem space, compromising the quality of automation when out-of-distribution states are encountered. Our approach RECO jointly learns both an imitation policy and recovery policy from expert data. The recovery policy steers the agent from unknown states back to the demonstrated states in the data set. While there is, per definition, no data available to learn the recovery policy, we exploit abstractions to generalize beyond the available data and simulate the recovery problem. When the most appropriate abstraction for the given data is unknown, our method selects the best recovery policy from a set generated by several candidate abstractions. In tabular domains, where we assume an agent must call to a human supervisor for help if it is in an unknown state, we show how RECO results in drastically fewer calls without compromising solution quality and with relatively few trajectories provided by an expert. We also introduce a continuous adaptation of our method and demonstrate the ability of RECO to recover an agent from states where its supervised learning-based imitation policy would otherwise fail.

Original language	English
Title of host publication	31th International Conference on Automated Planning and Scheduling
Publisher	American Association for Artificial Intelligence (AAAI)
Pages	560-568
Number of pages	9
Publication status	Published - 2021
Event	31st International Conference on Automated Planning and Scheduling - Virtual/online event Duration: 7 Jun 2021 → 12 Jun 2021 Conference number: 31

Conference

Conference	31st International Conference on Automated Planning and Scheduling
Abbreviated title	ICAPS 2021
Period	7/06/21 → 12/06/21

Access to Document

16004-Article Text-19497-1-2-20210517(1)Final published version, 833 KB

Cite this

@inproceedings{be465f1be6db4863940902e93b356e2d,

title = "Abstraction-Guided Policy Recovery from Expert Demonstrations",

abstract = "Behavior cloning is a method of automated decision-making that aims to extract meaningful information from expert demonstrations and reproduce the same behavior autonomously. It is unlikely that demonstrations will exhaustively cover the potential problem space, compromising the quality of automation when out-of-distribution states are encountered. Our approach RECO jointly learns both an imitation policy and recovery policy from expert data. The recovery policy steers the agent from unknown states back to the demonstrated states in the data set. While there is, per definition, no data available to learn the recovery policy, we exploit abstractions to generalize beyond the available data and simulate the recovery problem. When the most appropriate abstraction for the given data is unknown, our method selects the best recovery policy from a set generated by several candidate abstractions. In tabular domains, where we assume an agent must call to a human supervisor for help if it is in an unknown state, we show how RECO results in drastically fewer calls without compromising solution quality and with relatively few trajectories provided by an expert. We also introduce a continuous adaptation of our method and demonstrate the ability of RECO to recover an agent from states where its supervised learning-based imitation policy would otherwise fail. ",

author = "C.T. Ponnambalam and F.A. Oliehoek and M.T.J. Spaan",

year = "2021",

language = "English",

pages = "560--568",

booktitle = "31th International Conference on Automated Planning and Scheduling",

publisher = "American Association for Artificial Intelligence (AAAI)",

address = "United States",

note = "31st International Conference on Automated Planning and Scheduling, ICAPS 2021 ; Conference date: 07-06-2021 Through 12-06-2021",

}

Abstraction-Guided Policy Recovery from Expert Demonstrations. / Ponnambalam, C.T.; Oliehoek, F.A.; Spaan, M.T.J.
31th International Conference on Automated Planning and Scheduling. American Association for Artificial Intelligence (AAAI), 2021. p. 560-568.

Research output: Chapter in Book/Conference proceedings/Edited volume › Conference contribution › Scientific › peer-review

TY - GEN

T1 - Abstraction-Guided Policy Recovery from Expert Demonstrations

AU - Ponnambalam, C.T.

AU - Oliehoek, F.A.

AU - Spaan, M.T.J.

N1 - Conference code: 31

PY - 2021

Y1 - 2021

N2 - Behavior cloning is a method of automated decision-making that aims to extract meaningful information from expert demonstrations and reproduce the same behavior autonomously. It is unlikely that demonstrations will exhaustively cover the potential problem space, compromising the quality of automation when out-of-distribution states are encountered. Our approach RECO jointly learns both an imitation policy and recovery policy from expert data. The recovery policy steers the agent from unknown states back to the demonstrated states in the data set. While there is, per definition, no data available to learn the recovery policy, we exploit abstractions to generalize beyond the available data and simulate the recovery problem. When the most appropriate abstraction for the given data is unknown, our method selects the best recovery policy from a set generated by several candidate abstractions. In tabular domains, where we assume an agent must call to a human supervisor for help if it is in an unknown state, we show how RECO results in drastically fewer calls without compromising solution quality and with relatively few trajectories provided by an expert. We also introduce a continuous adaptation of our method and demonstrate the ability of RECO to recover an agent from states where its supervised learning-based imitation policy would otherwise fail.

AB - Behavior cloning is a method of automated decision-making that aims to extract meaningful information from expert demonstrations and reproduce the same behavior autonomously. It is unlikely that demonstrations will exhaustively cover the potential problem space, compromising the quality of automation when out-of-distribution states are encountered. Our approach RECO jointly learns both an imitation policy and recovery policy from expert data. The recovery policy steers the agent from unknown states back to the demonstrated states in the data set. While there is, per definition, no data available to learn the recovery policy, we exploit abstractions to generalize beyond the available data and simulate the recovery problem. When the most appropriate abstraction for the given data is unknown, our method selects the best recovery policy from a set generated by several candidate abstractions. In tabular domains, where we assume an agent must call to a human supervisor for help if it is in an unknown state, we show how RECO results in drastically fewer calls without compromising solution quality and with relatively few trajectories provided by an expert. We also introduce a continuous adaptation of our method and demonstrate the ability of RECO to recover an agent from states where its supervised learning-based imitation policy would otherwise fail.

M3 - Conference contribution

SP - 560

EP - 568

BT - 31th International Conference on Automated Planning and Scheduling

PB - American Association for Artificial Intelligence (AAAI)

T2 - 31st International Conference on Automated Planning and Scheduling

Y2 - 7 June 2021 through 12 June 2021

ER -

Abstraction-Guided Policy Recovery from Expert Demonstrations

Abstract

Conference

Access to Document

Fingerprint

Cite this