← Back to Benchmarks
simmediumatarimetric · varies
SOAP-RL: Sequential Option Advantage Propagation for Reinforcement Learning in POMDP Environments
Description
This work compares ways of extending Reinforcement Learning algorithms to Partially Observed Markov Decision Processes (POMDPs) with options. One view of options is as temporally extended action, which can be realized as a memory that allows the agent to retain historical information beyond the policy's context window. While option assignment could be handled using heuristics and hand-crafted objectives, learning temporally consistent options and associated sub-policies without explicit supervis