dataset

train_grpo.py

kossisoroyce

or hover any field below to flag it

Overview

Name

train_grpo.py

Source

kossisoroyce

Episodes

Robot count

Format

other

Description

GRPO Training Script for Qwen Model on GSM8K Dataset. This script trains a Qwen model using the GRPO (Generalized Reinforcement Policy Optimization) method on the GSM8K (Generalized Math 8K) dataset. The script leverages transformers, PEFT (Parameter-Efficient Fine-Tuning), and TRL (Transformers Rei

Robots used

null

Links

HuggingFace dataset

null