simmediumoffline-rlmetric · varies

Offline Reinforcement Learning with Penalized Action Noise Injection

Description

Offline reinforcement learning (RL) optimizes a policy using only a fixed dataset, making it a practical approach in scenarios where interaction with the environment is costly. Due to this limitation, generalization ability is key to improving the performance of offline RL algorithms, as demonstrated by recent successes of offline RL with diffusion models. However, it remains questionable whether such diffusion models are necessary for highly performing offline RL algorithms, given their signifi

Source

http://arxiv.org/abs/2507.02356v1