simmediumoffline-rlmetric · varies

Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies

Description

With the rapid development of e-commerce, auto-bidding has become a key asset in optimizing advertising performance under diverse advertiser environments. The current approaches focus on reinforcement learning (RL) and generative models. These efforts imitate offline historical behaviors by utilizing a complex structure with expensive hyperparameter tuning. The suboptimal trajectories further exacerbate the difficulty of policy learning. To address these challenges, we proposes QGA, a novel Q-

Source

http://arxiv.org/abs/2601.02754v2