dataset

ARPO-SFT-54K

dongguanting

or hover any field below to flag it

Overview

Name

ARPO-SFT-54K

Source

dongguanting

Episodes

Robot count

Format

json

Description

Agentic Reinforced Policy Optimization (ARPO) Dataset This repository contains the datasets associated with the paper Agentic Reinforced Policy Optimization (ARPO). ARPO proposes a novel agentic Reinforcement Learning algorithm designed for training multi-turn Large Language Model (LLM)-based agents. It addresses the challenge of balancing intrinsic long-horizon reasoning capabilities with proficiency in multi-turn tool interactions, particularly noting the increased uncertainty in… See the full description on the dataset page: https://huggingface.co/datasets/dongguanting/ARPO-SFT-54K.

Robots used

null

Links

HuggingFace dataset

dongguanting/ARPO-SFT-54K