simmediumpolicy-learningmetric · varies

Adapting Critic Match Loss Landscape Visualization to Off-policy Reinforcement Learning

Description

This work extends an established critic match loss landscape visualization method from online to off-policy reinforcement learning (RL), aiming to reveal the optimization geometry behind critic learning. Off-policy RL differs from stepwise online actor-critic learning in its replay-based data flow and target computation. Based on these two structural differences, the critic match loss landscape visualization method is adapted to the Soft Actor-Critic (SAC) algorithm by aligning the loss evaluati

Source

http://arxiv.org/abs/2603.14589v1