dataset

ARPO-RL-Reasoning-10K

dongguanting

or hover any field below to flag it

Overview

Name
ARPO-RL-Reasoning-10K
Source
dongguanting
Episodes
0
Robot count
0
Format
parquet
Description
Agentic Reinforced Policy Optimization (ARPO) Dataset This repository contains the datasets associated with the paper Agentic Reinforced Policy Optimization. Abstract Large-scale reinforcement learning with verifiable rewards (RLVR) has demonstrated its effectiveness in harnessing the potential of large language models (LLMs) for single-turn reasoning tasks. In realistic reasoning scenarios, LLMs can often utilize external tools to assist in task-solving processes.… See the full description on the dataset page: https://huggingface.co/datasets/dongguanting/ARPO-RL-Reasoning-10K.
Robots used
null

Links