Title | Rethinking Population-Assisted Off-policy Reinforcement Learning |
Author | |
Corresponding Author | Cheng,Ran |
DOI | |
Publication Years | 2023-07-15
|
Conference Name | Genetic and Evolutionary Computation Conference (GECCO)
|
Source Title | |
Pages | 624-632
|
Conference Date | JUL 15-19, 2023
|
Conference Place | null,Lisbon,PORTUGAL
|
Publication Place | 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
|
Publisher | |
Abstract | While off-policy reinforcement learning (RL) algorithms are sample efficient due to gradient-based updates and data reuse in the replay buffer, they struggle with convergence to local optima due to limited exploration. On the other hand, population-based algorithms offer a natural exploration strategy, but their heuristic black-box operators are inefficient. Recent algorithms have integrated these two methods, connecting them through a shared replay buffer. However, the effect of using diverse data from population optimization iterations on off-policy RL algorithms has not been thoroughly investigated. In this paper, we first analyze the use of off-policy RL algorithms in combination with population-based algorithms, showing that the use of population data could introduce an overlooked error and harm performance. To test this, we propose a uniform and scalable training design and conduct experiments on our tailored framework in robot locomotion tasks from the OpenAI gym. Our results substantiate that using population data in off-policy RL can cause instability during training and even degrade performance. To remedy this issue, we further propose a double replay buffer design that provides more on-policy data and show its effectiveness through experiments. Our results offer practical insights for training these hybrid methods. |
Keywords | |
SUSTech Authorship | First
; Corresponding
|
Language | English
|
URL | [Source Record] |
Indexed By | |
Funding Project | Program for Guangdong Introducing Innovative and Entrepreneurial Teams[2017ZT07X386]
|
WOS Research Area | Computer Science
|
WOS Subject | Computer Science, Artificial Intelligence
; Computer Science, Information Systems
|
WOS Accession No | WOS:001031455100070
|
Scopus EID | 2-s2.0-85167728613
|
Data Source | Scopus
|
Citation statistics |
Cited Times [WOS]:0
|
Document Type | Conference paper |
Identifier | http://kc.sustech.edu.cn/handle/2SGJ60CL/559823 |
Department | Department of Computer Science and Engineering |
Affiliation | Department of Computer Science and Engineering,Southern University of Science and Technology,Guangdong,Shenzhen,China |
First Author Affilication | Department of Computer Science and Engineering |
Corresponding Author Affilication | Department of Computer Science and Engineering |
First Author's First Affilication | Department of Computer Science and Engineering |
Recommended Citation GB/T 7714 |
Zheng,Bowen,Cheng,Ran. Rethinking Population-Assisted Off-policy Reinforcement Learning[C]. 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES:ASSOC COMPUTING MACHINERY,2023:624-632.
|
Files in This Item: | There are no files associated with this item. |
|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.
Edit Comment