中文版 | English
Title

Rethinking Population-Assisted Off-policy Reinforcement Learning

Author
Corresponding AuthorCheng,Ran
DOI
Publication Years
2023-07-15
Conference Name
Genetic and Evolutionary Computation Conference (GECCO)
Source Title
Pages
624-632
Conference Date
JUL 15-19, 2023
Conference Place
null,Lisbon,PORTUGAL
Publication Place
1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES
Publisher
Abstract
While off-policy reinforcement learning (RL) algorithms are sample efficient due to gradient-based updates and data reuse in the replay buffer, they struggle with convergence to local optima due to limited exploration. On the other hand, population-based algorithms offer a natural exploration strategy, but their heuristic black-box operators are inefficient. Recent algorithms have integrated these two methods, connecting them through a shared replay buffer. However, the effect of using diverse data from population optimization iterations on off-policy RL algorithms has not been thoroughly investigated. In this paper, we first analyze the use of off-policy RL algorithms in combination with population-based algorithms, showing that the use of population data could introduce an overlooked error and harm performance. To test this, we propose a uniform and scalable training design and conduct experiments on our tailored framework in robot locomotion tasks from the OpenAI gym. Our results substantiate that using population data in off-policy RL can cause instability during training and even degrade performance. To remedy this issue, we further propose a double replay buffer design that provides more on-policy data and show its effectiveness through experiments. Our results offer practical insights for training these hybrid methods.
Keywords
SUSTech Authorship
First ; Corresponding
Language
English
URL[Source Record]
Indexed By
Funding Project
Program for Guangdong Introducing Innovative and Entrepreneurial Teams[2017ZT07X386]
WOS Research Area
Computer Science
WOS Subject
Computer Science, Artificial Intelligence ; Computer Science, Information Systems
WOS Accession No
WOS:001031455100070
Scopus EID
2-s2.0-85167728613
Data Source
Scopus
Citation statistics
Cited Times [WOS]:0
Document TypeConference paper
Identifierhttp://kc.sustech.edu.cn/handle/2SGJ60CL/559823
DepartmentDepartment of Computer Science and Engineering
Affiliation
Department of Computer Science and Engineering,Southern University of Science and Technology,Guangdong,Shenzhen,China
First Author AffilicationDepartment of Computer Science and Engineering
Corresponding Author AffilicationDepartment of Computer Science and Engineering
First Author's First AffilicationDepartment of Computer Science and Engineering
Recommended Citation
GB/T 7714
Zheng,Bowen,Cheng,Ran. Rethinking Population-Assisted Off-policy Reinforcement Learning[C]. 1601 Broadway, 10th Floor, NEW YORK, NY, UNITED STATES:ASSOC COMPUTING MACHINERY,2023:624-632.
Files in This Item:
There are no files associated with this item.
Related Services
Fulltext link
Recommend this item
Bookmark
Usage statistics
Export to Endnote
Export to Excel
Export to Csv
Altmetrics Score
Google Scholar
Similar articles in Google Scholar
[Zheng,Bowen]'s Articles
[Cheng,Ran]'s Articles
Baidu Scholar
Similar articles in Baidu Scholar
[Zheng,Bowen]'s Articles
[Cheng,Ran]'s Articles
Bing Scholar
Similar articles in Bing Scholar
[Zheng,Bowen]'s Articles
[Cheng,Ran]'s Articles
Terms of Use
No data!
Social Bookmark/Share
No comment.

Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.