dataset
train_grpo.py
kossisoroyce
or hover any field below to flag it
Overview
Name
train_grpo.py
Source
kossisoroyce
Episodes
0
Robot count
0
Format
other
Description
GRPO Training Script for Qwen Model on GSM8K Dataset. This script trains a Qwen model using the GRPO (Generalized Reinforcement Policy Optimization) method on the GSM8K (Generalized Math 8K) dataset. The script leverages transformers, PEFT (Parameter-Efficient Fine-Tuning), and TRL (Transformers Rei
Robots used
null
Links
HuggingFace dataset
null