dataset
navigation-corpus-ewe-speech
ghananlpcommunity
or hover any field below to flag it
Overview
Name
navigation-corpus-ewe-speech
Source
ghananlpcommunity
Episodes
0
Robot count
0
Format
parquet
Description
Ewe Speech Segments (sentence splitting)
49348 speech-text pairs split from long recordings.
Processing pipeline
Source audio from ghananlpcommunity/navigation-corpus-speech-full-ewe
Full-file CTC forced alignment (MMS-300M) for word-level timestamps
Sentence-boundary splits (. ? !) — long sentences re-chunked to 16 words
Leading/trailing silence trimmed with VAD (-40 dBFS threshold)
Filtered: min 1.0s, max 15.0s
Original sample rate preserved
Usage
from… See the full description on the dataset page: https://huggingface.co/datasets/ghananlpcommunity/navigation-corpus-ewe-speech.
Robots used
null
Links
HuggingFace dataset