Actor-Critic algorithm based on generalized advantage estimation in continuous time and space
-
-
Abstract
This paper proposes a novel Actor-Critic algorithm that integrates continuous-time reinforcement learning with Generalized Advantage Estimation (GAE), addressing the issues of high policy gradient variance and unstable convergence commonly found in conventional continuous-time methods. By extending the multi-step advantage estimation of GAE to the continuous-time domain, we redefine the advantage function in an integral form and subsequently optimize both the policy evaluation (PE) and policy gradient (PG) processes. This approach reduces variance while preserving the accuracy of continuous dynamics. Experimental results demonstrate that the improved algorithm performs excellently in the MuJoCo Ant-v4 simulation environment, achieving a significant reduction in reward variance under the same convergence speed. The proposed algorithm exhibits substantial application potential in complex control domains characterized by continuous action spaces and sparse rewards.
-
-