simmediumrlmetric · varies

Delayed Homomorphic Reinforcement Learning for Environments with Delayed Feedback

Description

Reinforcement learning in real-world systems is often accompanied by delayed feedback, which breaks the Markov assumption and impedes both learning and control. Canonical state augmentation approaches cause the state-space explosion, which introduces a severe sample-complexity burden. Despite recent progress, the state-of-the-art augmentation-based baselines remain incomplete: they either predominantly reduce the burden on the critic or adopt non-unified treatments for the actor and critic. To p

Source

http://arxiv.org/abs/2604.03641v1