simmediumoffline-rlmetric · varies

Learning Optimal and Sample-Efficient Decision Policies with Guarantees

Description

The paradigm of decision-making has been revolutionised by reinforcement learning and deep learning. Although this has led to significant progress in domains such as robotics, healthcare, and finance, the use of RL in practice is challenging, particularly when learning decision policies in high-stakes applications that may require guarantees. Traditional RL algorithms rely on a large number of online interactions with the environment, which is problematic in scenarios where online interactions a

Source

http://arxiv.org/abs/2602.17978v1