dataset
guru-RL-92k
LLM360
or hover any field below to flag it
Overview
Name
guru-RL-92k
Source
LLM360
Episodes
0
Robot count
0
Format
parquet
Description
Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective
Dataset Description
Guru is a curated six-domain dataset for training large language models (LLM) for complex reasoning with reinforcement learning (RL). The dataset contains 91.9K high-quality samples spanning six diverse reasoning-intensive domains, processed through a comprehensive five-stage curation pipeline to ensure both domain diversity and reward verifiability.
Dataset… See the full description on the dataset page: https://huggingface.co/datasets/LLM360/guru-RL-92k.
Robots used
null
Links
HuggingFace dataset