← Back to Benchmarks
simmediumoffline-rlmetric · varies
GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA
Description
This paper proposes \textit{Group-relative Implicit Fine-Tuning (GIFT)}, a reinforcement learning framework for aligning large language models (LLMs) that unifies on-policy optimization with implicit preference learning. GIFT combines three key elements: (1) group-based sampling and normalization from GRPO, (2) the implicit reward formulation of DPO, and (3) the training principle underlying UNA. The central idea is to transform reward maximization into a \textit{group-wise reward matching probl