simmediumpolicy-learningmetric · varies

Regularized Latent Dynamics Prediction is a Strong Baseline For Behavioral Foundation Models

Description

Behavioral Foundation Models (BFMs) produce agents with the capability to adapt to any unknown reward or task. These methods, however, are only able to produce near-optimal policies for the reward functions that are in the span of some pre-existing state features, making the choice of state features crucial to the expressivity of the BFM. As a result, BFMs are trained using a variety of complex objectives and require sufficient dataset coverage, to train task-useful spanning features. In this wo

Source

http://arxiv.org/abs/2603.15857v1