Actor-Critic algorithm based on generalized advantage estimation in continuous time and space

HUANG Pei

doi:10.13603/j.cnki.51-1621/z.2026.04.005

HUANG Pei. Actor-Critic algorithm based on generalized advantage estimation in continuous time and spaceJ. Journal of Neijiang Normal University, 2026, 41(4): 29-35. DOI: 10.13603/j.cnki.51-1621/z.2026.04.005

Citation:

Actor-Critic algorithm based on generalized advantage estimation in continuous time and space

HUANG Pei

Graphical Abstract

Abstract

Abstract

This paper proposes a novel Actor-Critic algorithm that integrates continuous-time reinforcement learning with Generalized Advantage Estimation (GAE), addressing the issues of high policy gradient variance and unstable convergence commonly found in conventional continuous-time methods. By extending the multi-step advantage estimation of GAE to the continuous-time domain, we redefine the advantage function in an integral form and subsequently optimize both the policy evaluation (PE) and policy gradient (PG) processes. This approach reduces variance while preserving the accuracy of continuous dynamics. Experimental results demonstrate that the improved algorithm performs excellently in the MuJoCo Ant-v4 simulation environment, achieving a significant reduction in reward variance under the same convergence speed. The proposed algorithm exhibits substantial application potential in complex control domains characterized by continuous action spaces and sparse rewards.

FullText(HTML)

References (14)

Supplements (0)

Cited By

Actor-Critic algorithm based on generalized advantage estimation in continuous time and space

Abstract

Catalog

Export File

Citation

Format

Content