dataset

stackexchange_flattened

midwestern-simulation

or hover any field below to flag it

Overview

Name
stackexchange_flattened
Source
midwestern-simulation
Episodes
0
Robot count
0
Format
parquet
Description
7.6M threads of posts + answers + comments from stackexchange (omitting stackoverflow). with the Llama2 tokenizer (32k vocab) this should come out to ~7.94GT
Robots used
null

Links