simmediumatarimetric · varies

Swift-Sarsa: Fast and Robust Linear Control

Description

Javed, Sharifnassab, and Sutton (2024) introduced a new algorithm for TD learning -- SwiftTD -- that augments True Online TD($λ$) with step-size optimization, a bound on the effective learning rate, and step-size decay. In their experiments SwiftTD outperformed True Online TD($λ$) and TD($λ$) on a variety of prediction tasks derived from Atari games, and its performance was robust to the choice of hyper-parameters. In this extended abstract we extend SwiftTD to work for control problems. We comb

Source

http://arxiv.org/abs/2507.19539v1