simmediumrlmetric · varies

Pseudo-Quantized Actor-Critic Algorithm for Robustness to Noisy Temporal Difference Error

Description

In reinforcement learning (RL), temporal difference (TD) errors are widely adopted for optimizing value and policy functions. However, since the TD error is defined by a bootstrap method, its computation tends to be noisy and destabilize learning. Heuristics to improve the accuracy of TD errors, such as target networks and ensemble models, have been introduced so far. While these are essential approaches for the current deep RL algorithms, they cause side effects like increased computational cos

Source

http://arxiv.org/abs/2604.01613v1