Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

Ji Cheng, Bo Xue, Jiaxiang Yi, Qingfu Zhang*

*Corresponding author for this work

Research output: Contribution to journalConference articleScientificpeer-review

Abstract

Multi-objective Stochastic Linear bandit (MOSLB) plays a critical role in the sequential decision-making paradigm, however, most existing methods focus on the Pareto dominance among different objectives without considering any priority. In this paper, we study bandit algorithms under mixed Pareto-lexicographic orders, which can reflect decision makers’ preferences. We adopt the Grossone approach to deal with these orders and develop the notion of Pareto-lexicographic optimality to evaluate the learners’ performance. Our work represents a first attempt to address these important and realistic orders in bandit algorithms. To design algorithms under these orders, the upper confidence bound (UCB) policy and the prior free lexicographical filter are adapted to approximate the optimal arms at each round. Moreover, the framework of the algorithms involves two stages in pursuit of the balance between exploration and exploitation. Theoretical analysis as well as numerical experiments demonstrate the effectiveness of our algorithms.

Original languageEnglish
Pages (from-to)11489-11497
Number of pages9
JournalProceedings of the AAAI Conference on Artificial Intelligence
Volume38
Issue number10
DOIs
Publication statusPublished - 2024
Event38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada
Duration: 20 Feb 202427 Feb 2024

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • Reinforcement Learning
  • Online Learning & Bandits

Fingerprint

Dive into the research topics of 'Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits'. Together they form a unique fingerprint.

Cite this