← Back to Benchmarks
simmediumoffline-rlmetric · varies

Sample-Efficient Policy Constraint Offline Deep Reinforcement Learning based on Sample Filtering

Description

Offline reinforcement learning (RL) aims to learn a policy that maximizes the expected return using a given static dataset of transitions. However, offline RL faces the distribution shift problem. The policy constraint offline RL method is proposed to solve the distribution shift problem. During the policy constraint offline RL training, it is important to ensure the difference between the learned policy and behavior policy within a given threshold. Thus, the learned policy heavily relies on the

Source

http://arxiv.org/abs/2512.20115v1