Off-policy experience retention for deep actor-critic learning

Tim de Bruin, Jens Kober, K.P. Tuyls, Robert Babuska

    Research output: Chapter in Book/Conference proceedings/Edited volumeConference contributionScientificpeer-review


    When a limited number of experiences is kept in memory to train a reinforcement learning agent, the criterion that determines which experiences are retained can have a strong impact on the learning performance. In this paper, we argue that for actor critic learning in domains with significant momentum, it is important to retain experiences with off-policy actions when the amount of exploration is reduced over time. This claim is supported by simulation experiments with a pendulum swing-up problem and a magnetic manipulation task. Additionally, we compare our strategy to database overwriting policies based on obtaining experiences spread out over the state-action space, and also to using the temporal difference error as a proxy for the value of experiences.
    Original languageEnglish
    Title of host publicationDeep Reinforcement Learning Workshop, NIPS 2016 - December 9, 2016
    Number of pages9
    Publication statusPublished - 2016
    EventNIPS 2016: 30th Conference on Neural Information Processing Systems - Centre Convencions Internacional Barcelona, Barcelona, Spain
    Duration: 5 Dec 201610 Dec 2016


    ConferenceNIPS 2016: 30th Conference on Neural Information Processing Systems
    Abbreviated titleNIPS
    Internet address


    Dive into the research topics of 'Off-policy experience retention for deep actor-critic learning'. Together they form a unique fingerprint.

    Cite this