simmediumoffline-rlmetric · varies

Use the Online Network If You Can: Towards Fast and Stable Reinforcement Learning

Description

The use of target networks is a popular approach for estimating value functions in deep Reinforcement Learning (RL). While effective, the target network remains a compromise solution that preserves stability at the cost of slowly moving targets, thus delaying learning. Conversely, using the online network as a bootstrapped target is intuitively appealing, albeit well-known to lead to unstable learning. In this work, we aim to obtain the best out of both worlds by introducing a novel update rule

Source

http://arxiv.org/abs/2510.02590v1