dataset
Reinforcement-Fine-Tuning-LLMs-with-GRPO
ahmecse
or hover any field below to flag it
Overview
Name
Reinforcement-Fine-Tuning-LLMs-with-GRPO
Source
ahmecse
Episodes
0
Robot count
0
Format
other
Description
RFT with GRPO: RFT helps adapt LLMs to complex reasoning tasks like math and coding by using RL, enabling models to develop their own strategies instead of mimicking examples as in SFT. GRPO, a tailored RL algorithm, excels in tasks with verifiable outcomes and works well with small datasets.
Robots used
null
Links
HuggingFace dataset
null