Mandarin Chinese Speech Synthesis Corpus (Basic Corpus)
View resource name in all available languages
Corpus de synthèse de la parole du chinois mandarin (corpus de base)
ID:
ELRA-S0228_01
This corpus contains the recordings of 1 native Chinese speaker (female).
The corpus is composed of 20 texts with 109,227 words and has been proofread manually. The corpus contents include: phrases, digit strings, letter strings, uncommon words, neutral tone, final retroflexion, Latin alphabet, interrogative sentences, 282 English words.
The speaker has been recorded in a professional recording studio over 2 channels: microphone and glottis wave (fundamental frequency) signals for a total of 18.2 hours.
Speech samples are stored as sequences of 16-bit 44,1 kHz PCM on two channels. The total data size is 5.67 Gb for a total of 12,679 files. The data is encoded in GB-2312 format.
The transcriptions include labels for four-class pause boundaries.
This database is aimed to be used within text-to-speech and speech synthesis applications.
View resource description in all available languages