← Back to Benchmarks
simmediumoffline-rlmetric · varies
VIPO: Value Function Inconsistency Penalized Offline Reinforcement Learning
Description
Offline reinforcement learning (RL) learns effective policies from pre-collected datasets, offering a practical solution for applications where online interactions are risky or costly. Model-based approaches are particularly advantageous for offline RL, owing to their data efficiency and generalizability. However, due to inherent model errors, model-based methods often artificially introduce conservatism guided by heuristic uncertainty estimation, which can be unreliable. In this paper, we intro