simmediumoffline-rlmetric · varies

Agile Reinforcement Learning through Separable Neural Architecture

Description

Deep reinforcement learning (RL) is increasingly deployed in resource-constrained environments, yet the go-to function approximators - multilayer perceptrons (MLPs) - are often parameter-inefficient due to an imperfect inductive bias for the smooth structure of many value functions. This mismatch can also hinder sample efficiency and slow policy learning in this capacity-limited regime. Although model compression techniques exist, they operate post-hoc and do not improve learning efficiency. Rec

Source

http://arxiv.org/abs/2601.23225v1