simmediumoffline-rlmetric · varies

Flow Actor-Critic for Offline Reinforcement Learning

Description

The dataset distributions in offline reinforcement learning (RL) often exhibit complex and multi-modal distributions, necessitating expressive policies to capture such distributions beyond widely-used Gaussian policies. To handle such complex and multi-modal datasets, in this paper, we propose Flow Actor-Critic, a new actor-critic method for offline RL, based on recent flow policies. The proposed method not only uses the flow model for actor as in previous flow policies but also exploits the exp

Source

http://arxiv.org/abs/2602.18015v1