simmediumatarimetric · varies

Transformer-based World Models Are Happy With 100k Interactions

Description

Deep neural networks have been successful in many reinforcement learning settings. However, compared to human learners they are overly data hungry. To build a sample-efficient world model, we apply a transformer to real-world episodes in an autoregressive manner: not only the compact latent states and the taken actions but also the experienced or predicted rewards are fed into the transformer, so that it can attend flexibly to all three modalities at different time steps. The transformer allows

Source

http://arxiv.org/abs/2303.07109v1