simmediumatarimetric · varies

Addressing Optimism Bias in Sequence Modeling for Reinforcement Learning

Description

Impressive results in natural language processing (NLP) based on the Transformer neural network architecture have inspired researchers to explore viewing offline reinforcement learning (RL) as a generic sequence modeling problem. Recent works based on this paradigm have achieved state-of-the-art results in several of the mostly deterministic offline Atari and D4RL benchmarks. However, because these methods jointly model the states and actions as a single sequencing problem, they struggle to dise

Source

http://arxiv.org/abs/2207.10295v1