Skip to main navigation Skip to search Skip to main content

Oikonomos-II: A Reinforcement-Learning, Resource-Recommendation System for Cloud HPC

J. L. F. Betting, C. I. De Zeeuw, C. Strydis

Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review

34 Downloads (Pure)

Abstract

The cloud has become a powerful and useful environment for the deployment of High-Performance Computing (HPC) applications, but the large number of available instance types poses a challenge in selecting the optimal platform. Users often do not have the time or knowledge necessary to make an optimal choice. Recommender systems have been developed for this purpose but current state-of-the-art systems either require large amounts of training data, or require running the application multiple times; this is costly. In this work, we propose Oikonomos-II, a resource-recommendation system based on reinforcement learning for HPC applications in the cloud. Oikonomos-II models the relationship between different input parameters, instance types, and execution times. The system does not require any preexisting training data or repeated job executions, as it gathers its own training data opportunistically using user-submitted jobs, employing a variant of the Neural-LinUCB algorithm. When deployed on a mix of HPC applications, Oikonomos-II quickly converged towards an optimal policy. The system eliminates the need for preexisting training data or auxiliary runs, providing an economical, general-purpose, resource-recommendation system for cloud HPC.
Original languageEnglish
Title of host publicationProceedings of the 2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)
Place of PublicationPiscataway
PublisherIEEE
Pages266-276
Number of pages11
ISBN (Electronic)979-8-3503-8322-5
ISBN (Print)979-8-3503-8323-2
DOIs
Publication statusPublished - 2023
Event2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC) - Gao, India
Duration: 18 Dec 202321 Dec 2023

Publication series

NameProceedings - 2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics, HiPC 2023

Conference

Conference2023 IEEE 30th International Conference on High Performance Computing, Data, and Analytics (HiPC)
Country/TerritoryIndia
CityGao
Period18/12/2321/12/23

Bibliographical note

Green Open Access added to TU Delft Institutional Repository 'You share, we take care!' - Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • High-Performance Computing
  • resource recommendation
  • cloud computing
  • prediction
  • middle ware

Fingerprint

Dive into the research topics of 'Oikonomos-II: A Reinforcement-Learning, Resource-Recommendation System for Cloud HPC'. Together they form a unique fingerprint.

Cite this