simmediumatarimetric · varies

Faster Deep Reinforcement Learning with Slower Online Network

Description

Deep reinforcement learning algorithms often use two networks for value function optimization: an online network, and a target network that tracks the online network with some delay. Using two separate networks enables the agent to hedge against issues that arise when performing bootstrapping. In this paper we endow two popular deep reinforcement learning algorithms, namely DQN and Rainbow, with updates that incentivize the online network to remain in the proximity of the target network. This im

Source

http://arxiv.org/abs/2112.05848v3