dataset
pii-ner-corpus-synthetic-controlled
scanpatch
or hover any field below to flag it
Overview
Name
pii-ner-corpus-synthetic-controlled
Source
scanpatch
Episodes
0
Robot count
0
Format
parquet
Description
PII NER Corpus - Synthetic Controlled
A controlled synthetic dataset for training Named Entity Recognition models to detect Personally Identifiable Information (PII) in Ukrainian and Russian text.
This dataset was generated using a controlled pipeline with human-verified annotation guidelines. The text samples are based on real-world document patterns and annotated using Claude Sonnet 4 with strict quality controls.
Dataset Description
This dataset contains text… See the full description on the dataset page: https://huggingface.co/datasets/scanpatch/pii-ner-corpus-synthetic-controlled.
Robots used
null
Links
HuggingFace dataset