HUANG Pei. Actor-Critic algorithm based on generalized advantage estimation in continuous time and spaceJ. Journal of Neijiang Normal University, 2026, 41(4): 29-35. DOI: 10.13603/j.cnki.51-1621/z.2026.04.005
    Citation: HUANG Pei. Actor-Critic algorithm based on generalized advantage estimation in continuous time and spaceJ. Journal of Neijiang Normal University, 2026, 41(4): 29-35. DOI: 10.13603/j.cnki.51-1621/z.2026.04.005

    Actor-Critic algorithm based on generalized advantage estimation in continuous time and space

    • This paper proposes a novel Actor-Critic algorithm that integrates continuous-time reinforcement learning with Generalized Advantage Estimation (GAE), addressing the issues of high policy gradient variance and unstable convergence commonly found in conventional continuous-time methods. By extending the multi-step advantage estimation of GAE to the continuous-time domain, we redefine the advantage function in an integral form and subsequently optimize both the policy evaluation (PE) and policy gradient (PG) processes. This approach reduces variance while preserving the accuracy of continuous dynamics. Experimental results demonstrate that the improved algorithm performs excellently in the MuJoCo Ant-v4 simulation environment, achieving a significant reduction in reward variance under the same convergence speed. The proposed algorithm exhibits substantial application potential in complex control domains characterized by continuous action spaces and sparse rewards.
    • loading

    Catalog

      /

      DownLoad:  Full-Size Img  PowerPoint
      Return
      Return