dataset

DDRBench_10K_trajectory

thinkwee

or hover any field below to flag it

Overview

Name
DDRBench_10K_trajectory
Source
thinkwee
Episodes
0
Robot count
0
Format
json
Description
10K Agent Trajectories Dataset Project Page | Paper | Code Overview This dataset contains agent trajectories from the Deep Data Research (DDR) project's 10-K financial analysis task, as presented in the paper "Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models". DDR-Bench is a large-scale benchmark designed to evaluate "investigatory intelligence" in LLM agents—the autonomy to set goals and explore raw data without explicit queries. This… See the full description on the dataset page: https://huggingface.co/datasets/thinkwee/DDRBench_10K_trajectory.
Robots used
null

Links