Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

Ji Cheng; Bo Xue; Jiaxiang Yi; Qingfu Zhang

doi:10.1609/aaai.v38i10.29030

Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

Ji Cheng, Bo Xue, Jiaxiang Yi, Qingfu Zhang^*

^*Corresponding author for this work

Team Marcel Sluiter

Research output: Contribution to journal › Conference article › Scientific › peer-review

Abstract

Multi-objective Stochastic Linear bandit (MOSLB) plays a critical role in the sequential decision-making paradigm, however, most existing methods focus on the Pareto dominance among different objectives without considering any priority. In this paper, we study bandit algorithms under mixed Pareto-lexicographic orders, which can reflect decision makers’ preferences. We adopt the Grossone approach to deal with these orders and develop the notion of Pareto-lexicographic optimality to evaluate the learners’ performance. Our work represents a first attempt to address these important and realistic orders in bandit algorithms. To design algorithms under these orders, the upper confidence bound (UCB) policy and the prior free lexicographical filter are adapted to approximate the optimal arms at each round. Moreover, the framework of the algorithms involves two stages in pursuit of the balance between exploration and exploitation. Theoretical analysis as well as numerical experiments demonstrate the effectiveness of our algorithms.

Original language	English
Pages (from-to)	11489-11497
Number of pages	9
Journal	Proceedings of the AAAI Conference on Artificial Intelligence
Volume	38
Issue number	10
DOIs	https://doi.org/10.1609/aaai.v38i10.29030
Publication status	Published - 2024
Event	38th AAAI Conference on Artificial Intelligence, AAAI 2024 - Vancouver, Canada Duration: 20 Feb 2024 → 27 Feb 2024

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

Reinforcement Learning
Online Learning & Bandits

Access to Document

10.1609/aaai.v38i10.29030

Cite this

@article{4e07db99cc07434597cf85127f4b5c68,

title = "Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits",

abstract = "Multi-objective Stochastic Linear bandit (MOSLB) plays a critical role in the sequential decision-making paradigm, however, most existing methods focus on the Pareto dominance among different objectives without considering any priority. In this paper, we study bandit algorithms under mixed Pareto-lexicographic orders, which can reflect decision makers{\textquoteright} preferences. We adopt the Grossone approach to deal with these orders and develop the notion of Pareto-lexicographic optimality to evaluate the learners{\textquoteright} performance. Our work represents a first attempt to address these important and realistic orders in bandit algorithms. To design algorithms under these orders, the upper confidence bound (UCB) policy and the prior free lexicographical filter are adapted to approximate the optimal arms at each round. Moreover, the framework of the algorithms involves two stages in pursuit of the balance between exploration and exploitation. Theoretical analysis as well as numerical experiments demonstrate the effectiveness of our algorithms.",

keywords = "Reinforcement Learning, Online Learning & Bandits",

author = "Ji Cheng and Bo Xue and Jiaxiang Yi and Qingfu Zhang",

note = "Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.; 38th AAAI Conference on Artificial Intelligence, AAAI 2024 ; Conference date: 20-02-2024 Through 27-02-2024",

year = "2024",

doi = "10.1609/aaai.v38i10.29030",

language = "English",

volume = "38",

pages = "11489--11497",

journal = "Proceedings of the AAAI Conference on Artificial Intelligence",

issn = "2159-5399",

number = "10",

}

TY - JOUR

T1 - Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

AU - Cheng, Ji

AU - Xue, Bo

AU - Yi, Jiaxiang

AU - Zhang, Qingfu

N1 - Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

PY - 2024

Y1 - 2024

N2 - Multi-objective Stochastic Linear bandit (MOSLB) plays a critical role in the sequential decision-making paradigm, however, most existing methods focus on the Pareto dominance among different objectives without considering any priority. In this paper, we study bandit algorithms under mixed Pareto-lexicographic orders, which can reflect decision makers’ preferences. We adopt the Grossone approach to deal with these orders and develop the notion of Pareto-lexicographic optimality to evaluate the learners’ performance. Our work represents a first attempt to address these important and realistic orders in bandit algorithms. To design algorithms under these orders, the upper confidence bound (UCB) policy and the prior free lexicographical filter are adapted to approximate the optimal arms at each round. Moreover, the framework of the algorithms involves two stages in pursuit of the balance between exploration and exploitation. Theoretical analysis as well as numerical experiments demonstrate the effectiveness of our algorithms.

AB - Multi-objective Stochastic Linear bandit (MOSLB) plays a critical role in the sequential decision-making paradigm, however, most existing methods focus on the Pareto dominance among different objectives without considering any priority. In this paper, we study bandit algorithms under mixed Pareto-lexicographic orders, which can reflect decision makers’ preferences. We adopt the Grossone approach to deal with these orders and develop the notion of Pareto-lexicographic optimality to evaluate the learners’ performance. Our work represents a first attempt to address these important and realistic orders in bandit algorithms. To design algorithms under these orders, the upper confidence bound (UCB) policy and the prior free lexicographical filter are adapted to approximate the optimal arms at each round. Moreover, the framework of the algorithms involves two stages in pursuit of the balance between exploration and exploitation. Theoretical analysis as well as numerical experiments demonstrate the effectiveness of our algorithms.

KW - Reinforcement Learning

KW - Online Learning & Bandits

UR - http://www.scopus.com/inward/record.url?scp=85189753774&partnerID=8YFLogxK

U2 - 10.1609/aaai.v38i10.29030

DO - 10.1609/aaai.v38i10.29030

M3 - Conference article

AN - SCOPUS:85189753774

SN - 2159-5399

VL - 38

SP - 11489

EP - 11497

JO - Proceedings of the AAAI Conference on Artificial Intelligence

JF - Proceedings of the AAAI Conference on Artificial Intelligence

IS - 10

T2 - 38th AAAI Conference on Artificial Intelligence, AAAI 2024

Y2 - 20 February 2024 through 27 February 2024

ER -

Hierarchize Pareto Dominance in Multi-Objective Stochastic Linear Bandits

Abstract

Bibliographical note

Keywords

Access to Document

Other files and links

Embargoed Document

Fingerprint

Cite this