simmediumatarimetric · varies

Towards Hyperparameter-free Policy Selection for Offline Reinforcement Learning

Description

How to select between policies and value functions produced by different training algorithms in offline reinforcement learning (RL) -- which is crucial for hyperpa-rameter tuning -- is an important open question. Existing approaches based on off-policy evaluation (OPE) often require additional function approximation and hence hyperparameters, creating a chicken-and-egg situation. In this paper, we design hyperparameter-free algorithms for policy selection based on BVFT [XJ21], a recent theoretic

Source

http://arxiv.org/abs/2110.14000v3