simmediumrlmetric · varies

When Rubrics Fail: Error Enumeration as Reward in Reference-Free RL Post-Training for Virtual Try-On

Description

Reinforcement learning with verifiable rewards (RLVR) and Rubrics as Rewards (RaR) have driven strong gains in domains with clear correctness signals and even in subjective domains by synthesizing evaluation criteria from ideal reference answers. But many real-world tasks admit multiple valid outputs and lack the single ideal answer that rubric generation depends on. We identify this reference-free setting as a gap in current post-training methods and propose Implicit Error Counting (IEC) to fil

Source

http://arxiv.org/abs/2603.05659v2