dataset

Reinforcement-Fine-Tuning-LLMs-with-GRPO

ahmecse

or hover any field below to flag it

Overview

Name
Reinforcement-Fine-Tuning-LLMs-with-GRPO
Source
ahmecse
Episodes
0
Robot count
0
Format
other
Description
RFT with GRPO: RFT helps adapt LLMs to complex reasoning tasks like math and coding by using RL, enabling models to develop their own strategies instead of mimicking examples as in SFT. GRPO, a tailored RL algorithm, excels in tasks with verifiable outcomes and works well with small datasets.
Robots used
null

Links

HuggingFace dataset
null