dataset

Fast-Math-R1-GRPO

RabotniKuma

or hover any field below to flag it

Overview

Name

Fast-Math-R1-GRPO

Source

RabotniKuma

Episodes

Robot count

Format

csv

Description

This repository contains the second-stage GRPO dataset for the paper A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning. This dataset is crucial for the second stage of the training recipe, aiming to improve token efficiency while preserving peak mathematical reasoning performance in Large Language Models (LLMs) through Reinforcement Learning from online inference (GRPO). We extracted the answers from the 2nd stage SFT… See the full description on the dataset page: https://huggingface.co/datasets/RabotniKuma/Fast-Math-R1-GRPO.

Robots used

null

Links

HuggingFace dataset

RabotniKuma/Fast-Math-R1-GRPO