dataset

Fast-Math-R1-GRPO

RabotniKuma

or hover any field below to flag it

Overview

Name
Fast-Math-R1-GRPO
Source
RabotniKuma
Episodes
0
Robot count
0
Format
csv
Description
This repository contains the second-stage GRPO dataset for the paper A Practical Two-Stage Recipe for Mathematical LLMs: Maximizing Accuracy with SFT and Efficiency with Reinforcement Learning. This dataset is crucial for the second stage of the training recipe, aiming to improve token efficiency while preserving peak mathematical reasoning performance in Large Language Models (LLMs) through Reinforcement Learning from online inference (GRPO). We extracted the answers from the 2nd stage SFT… See the full description on the dataset page: https://huggingface.co/datasets/RabotniKuma/Fast-Math-R1-GRPO.
Robots used
null

Links