simmediumoffline-rlmetric · varies

Efficient Anti-exploration via VQVAE and Fuzzy Clustering in Offline Reinforcement Learning

Description

Pseudo-count is an effective anti-exploration method in offline reinforcement learning (RL) by counting state-action pairs and imposing a large penalty on rare or unseen state-action pair data. Existing anti-exploration methods count continuous state-action pairs by discretizing these data, but often suffer from the issues of dimension disaster and information loss in the discretization process, leading to efficiency and performance reduction, and even failure of policy learning. In this paper,

Source

http://arxiv.org/abs/2602.07889v1