dataset
ARPO-RL-Reasoning-10K
dongguanting
or hover any field below to flag it
Overview
Name
ARPO-RL-Reasoning-10K
Source
dongguanting
Episodes
0
Robot count
0
Format
parquet
Description
Agentic Reinforced Policy Optimization (ARPO) Dataset
This repository contains the datasets associated with the paper Agentic Reinforced Policy Optimization.
Abstract
Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks. In realistic reasoning scenarios, LLMs can often utilize external tools to assist in task-solving processes.… See the full description on the dataset page: https://huggingface.co/datasets/dongguanting/ARPO-RL-Reasoning-10K.
Robots used
null
Links
HuggingFace dataset