simmediumoffline-rlmetric · varies

What Fundamental Structure in Reward Functions Enables Efficient Sparse-Reward Learning?

Description

Sparse-reward reinforcement learning (RL) remains fundamentally hard: without structure, any agent needs $Ω(|\mathcal{S}||\mathcal{A}|/p)$ samples to recover rewards. We introduce Policy-Aware Matrix Completion (PAMC) as a first concrete step toward a structural reward learning framework. Our key idea is to exploit approximate low-rank + sparse structure in the reward matrix, under policy-biased (MNAR) sampling. We prove recovery guarantees with inverse-propensity weighting, and establish a visi

Source

http://arxiv.org/abs/2509.03790v2