dataset

ARPO-RL-Reasoning-10K

dongguanting

or hover any field below to flag it

Overview

Name

Source

dongguanting

Episodes

Robot count

Format

parquet

Description

Agentic Reinforced Policy Optimization (ARPO) Dataset This repository contains the datasets associated with the paper Agentic Reinforced Policy Optimization. Abstract Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks. In realistic reasoning scenarios, LLMs can often utilize external tools to assist in task-solving processes.… See the full description on the dataset page: https://huggingface.co/datasets/dongguanting/ARPO-RL-Reasoning-10K.

Robots used

null

Links

HuggingFace dataset

dongguanting/ARPO-RL-Reasoning-10K