simmediumoffline-rlmetric · varies

GIFT: Group-Relative Implicit Fine-Tuning Integrates GRPO with DPO and UNA

Description

This paper proposes \textit{Group-relative Implicit Fine-Tuning (GIFT)}, a reinforcement learning framework for aligning large language models (LLMs) that unifies on-policy optimization with implicit preference learning. GIFT combines three key elements: (1) group-based sampling and normalization from GRPO, (2) the implicit reward formulation of DPO, and (3) the training principle underlying UNA. The central idea is to transform reward maximization into a \textit{group-wise reward matching probl

Source

http://arxiv.org/abs/2510.23868v4