dataset

DDRBench_10K_trajectory

thinkwee

or hover any field below to flag it

Overview

Name

Source

thinkwee

Episodes

Robot count

Format

json

Description

10K Agent Trajectories Dataset Project Page | Paper | Code Overview This dataset contains agent trajectories from the Deep Data Research (DDR) project's 10-K financial analysis task, as presented in the paper "Hunt Instead of Wait: Evaluating Deep Data Research on Large Language Models". DDR-Bench is a large-scale benchmark designed to evaluate "investigatory intelligence" in LLM agents—the autonomy to set goals and explore raw data without explicit queries. This… See the full description on the dataset page: https://huggingface.co/datasets/thinkwee/DDRBench_10K_trajectory.

Robots used

null

Links

HuggingFace dataset

thinkwee/DDRBench_10K_trajectory