simmediumatarimetric · varies

SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments

Description

This work compares ways of extending Reinforcement Learning algorithms to Partially Observed Markov Decision Processes (POMDPs) with options. One view of options is as temporally extended action, which can be realized as a memory that allows the agent to retain historical information beyond the policy's context window. While option assignment could be handled using heuristics and hand-crafted objectives, learning temporally consistent options and associated sub-policies without explicit supervis

Source

http://arxiv.org/abs/2407.18913v2