Deterministic Policy Gradient: Convergence Analysis
The deterministic policy gradient (DPG) method proposed in Silver et al.  has been demonstrated to exhibit superior performance particularly for applications with multi-dimensional and continuous action spaces. However, it remains unclear whether DPG converges, and if so, how fast it converges and whether it converges as efficiently as other PG methods. In this paper, we provide a theoretical analysis of DPG to answer those questions. We study the single timescale DPG (often the case in practice) in both on-policy and off-policy settings, and show that both algorithms attain an ε- accurate stationary policy up to a system error with a sample complexity of O(ε). Moreover, we establish the convergence rate for DPG under Gaussian noise exploration, which is widely adopted in practice to improve the performance of DPG. To our best knowledge, this is the first non-asymptotic convergence characterization for DPG methods.
Science and Technology Program of Jingdezhen City[JCYJ20200109141601708];
|Document Type||Conference paper|
|Department||Department of Mechanical and Energy Engineering|
1.Department of Electrical and Computer Engineering,The Ohio State University,Columbus,United States
2.Department of Electrical and Computer Engineering,National University of Singapore,Singapore,Singapore
3.Department of Mechanical and Energy Engineering,Southern University of Science and Technology (SUSTech),Shenzhen,Guangdong,China
|Corresponding Author Affilication||Department of Mechanical and Energy Engineering|
Xiong，Huaqing,Xu，Tengyu,Zhao，Lin,et al. Deterministic Policy Gradient: Convergence Analysis[C],2022:2159-2169.
|Files in This Item:||There are no files associated with this item.|
|Recommend this item|
|Export to Endnote|
|Export to Excel|
|Export to Csv|
|Similar articles in Google Scholar|
|Similar articles in Baidu Scholar|
|Similar articles in Bing Scholar|
Items in the repository are protected by copyright, with all rights reserved, unless otherwise indicated.