dataset

pii-ner-corpus-synthetic-controlled

scanpatch

or hover any field below to flag it

Overview

Name

Source

scanpatch

Episodes

Robot count

Format

parquet

Description

PII NER Corpus - Synthetic Controlled A controlled synthetic dataset for training Named Entity Recognition models to detect Personally Identifiable Information (PII) in Ukrainian and Russian text. This dataset was generated using a controlled pipeline with human-verified annotation guidelines. The text samples are based on real-world document patterns and annotated using Claude Sonnet 4 with strict quality controls. Dataset Description This dataset contains text… See the full description on the dataset page: https://huggingface.co/datasets/scanpatch/pii-ner-corpus-synthetic-controlled.

Robots used

null

Links

HuggingFace dataset

scanpatch/pii-ner-corpus-synthetic-controlled