dataset
ARPO-SFT-54K
dongguanting
or hover any field below to flag it
Overview
Name
ARPO-SFT-54K
Source
dongguanting
Episodes
0
Robot count
0
Format
json
Description
Agentic Reinforced Policy Optimization (ARPO) Dataset
This repository contains the datasets associated with the paper Agentic Reinforced Policy Optimization (ARPO).
ARPO proposes a novel agentic Reinforcement Learning algorithm designed for training multi-turn Large Language Model (LLM)-based agents. It addresses the challenge of balancing intrinsic long-horizon reasoning capabilities with proficiency in multi-turn tool interactions, particularly noting the increased uncertainty in… See the full description on the dataset page: https://huggingface.co/datasets/dongguanting/ARPO-SFT-54K.
Robots used
null
Links
HuggingFace dataset