dataset

pii-ner-corpus-synthetic-controlled

scanpatch

or hover any field below to flag it

Overview

Name
pii-ner-corpus-synthetic-controlled
Source
scanpatch
Episodes
0
Robot count
0
Format
parquet
Description
PII NER Corpus - Synthetic Controlled A controlled synthetic dataset for training Named Entity Recognition models to detect Personally Identifiable Information (PII) in Ukrainian and Russian text. This dataset was generated using a controlled pipeline with human-verified annotation guidelines. The text samples are based on real-world document patterns and annotated using Claude Sonnet 4 with strict quality controls. Dataset Description This dataset contains text… See the full description on the dataset page: https://huggingface.co/datasets/scanpatch/pii-ner-corpus-synthetic-controlled.
Robots used
null

Links