simmediumoffline-rlmetric · varies

Unleashing Flow Policies with Distributional Critics

Description

Flow-based policies have recently emerged as a powerful tool in offline and offline-to-online reinforcement learning, capable of modeling the complex, multimodal behaviors found in pre-collected datasets. However, the full potential of these expressive actors is often bottlenecked by their critics, which typically learn a single, scalar estimate of the expected return. To address this limitation, we introduce the Distributional Flow Critic (DFC), a novel critic architecture that learns the compl

Source

http://arxiv.org/abs/2509.23087v1