Reinforcement Learning for the Knapsack Problem

Jacopo Pierotti*, Maximilian Kronmueller, Javier Alonso-Mora, J. Theresia van Essen, Wendelin Böhmer

*Corresponding author for this work

Research output: Chapter in Book/Conference proceedings/Edited volumeChapterScientificpeer-review

452 Downloads (Pure)

Abstract

Combinatorial optimization (CO) problems are at the heart of both practical and theoretical research. Due to their complexity, many problems cannot be solved via exact methods in reasonable time; hence, we resort to heuristic solution methods. In recent years, machine learning (ML) has brought immense benefits in many research areas, including heuristic solution methods for CO problems. Among ML methods, reinforcement learning (RL) seems to be the most promising method to find good solutions for CO problems. In this work, we investigate an RL framework, whose agent is based on self-attention, to achieve solutions for the knapsack problem, which is a CO problem. Our algorithm finds close to optimal solutions for instances up to one hundred items, which leads to conjecture that RL and self-attention may be major building blocks for future state-of-the-art heuristics for other CO problems.

Original languageEnglish
Title of host publicationAIRO Springer Series
PublisherSpringer Nature
Pages3-13
Number of pages11
DOIs
Publication statusPublished - 2021

Publication series

NameAIRO Springer Series
Volume6
ISSN (Print)2523-7047
ISSN (Electronic)2523-7055

Bibliographical note

Green Open Access added to TU Delft Institutional Repository ‘You share, we take care!’ – Taverne project https://www.openaccess.nl/en/you-share-we-take-care
Otherwise as indicated in the copyright section: the publisher is the copyright holder of this work and the author uses the Dutch legislation to make this work public.

Keywords

  • End-to-end
  • Knapsack problem
  • Multi-task DQN
  • Reinforcement learning
  • Self-attention
  • Transformer

Fingerprint

Dive into the research topics of 'Reinforcement Learning for the Knapsack Problem'. Together they form a unique fingerprint.

Cite this