dataset

Reinforcement-Fine-Tuning-LLMs-with-GRPO

ahmecse

or hover any field below to flag it

Overview

Name

Source

ahmecse

Episodes

Robot count

Format

other

Description

RFT with GRPO: RFT helps adapt LLMs to complex reasoning tasks like math and coding by using RL, enabling models to develop their own strategies instead of mimicking examples as in SFT. GRPO, a tailored RL algorithm, excels in tasks with verifiable outcomes and works well with small datasets.

Robots used

null

Links

HuggingFace dataset

null