← Back to Benchmarks
simmediumoffline-rlmetric · varies
Q-Regularized Generative Auto-Bidding: From Suboptimal Trajectories to Optimal Policies
Description
With the rapid development of e-commerce, auto-bidding has become a key asset in optimizing advertising performance under diverse advertiser environments. The current approaches focus on reinforcement learning (RL) and generative models. These efforts imitate offline historical behaviors by utilizing a complex structure with expensive hyperparameter tuning. The suboptimal trajectories further exacerbate the difficulty of policy learning. To address these challenges, we proposes QGA, a novel Q-