dataset

guru-RL-92k

LLM360

or hover any field below to flag it

Overview

Name

guru-RL-92k

Source

LLM360

Episodes

Robot count

Format

parquet

Description

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective Dataset Description Guru is a curated six-domain dataset for training large language models (LLM) for complex reasoning with reinforcement learning (RL). The dataset contains 91.9K high-quality samples spanning six diverse reasoning-intensive domains, processed through a comprehensive five-stage curation pipeline to ensure both domain diversity and reward verifiability. Dataset… See the full description on the dataset page: https://huggingface.co/datasets/LLM360/guru-RL-92k.

Robots used

null

Links

HuggingFace dataset

LLM360/guru-RL-92k