dataset
streaming-phi-deidentification-benchmark
vkatg
or hover any field below to flag it
Overview
Name
streaming-phi-deidentification-benchmark
Source
vkatg
Episodes
0
Robot count
0
Format
json
Description
Streaming PHI De-Identification Benchmark
Most PHI de-identification benchmarks evaluate a single document in isolation. That is not how clinical data actually moves. A patient's name appears in a clinical note, then in an ASR transcript ten minutes later, then in imaging metadata an hour after that. Each event looks low-risk on its own. The cumulative exposure across modalities is what creates re-identification risk.
This dataset captures that. Every record is fully synthetic. It… See the full description on the dataset page: https://huggingface.co/datasets/vkatg/streaming-phi-deidentification-benchmark.
Robots used
null
Links
HuggingFace dataset