PERSO Corpus

88 Last view: 2026-06-25

PERSO

ID:

http://urn.fi/urn:nbn:fi:lb-2014073053

http://islrn.org/resources/651-108-565-673-5

The PERSO speech database is the result of two research projects: Personalized Hidden Markov Modeling –based Text-To-Speech Synthesis: Assistive Technology for People with Communication Disabilities (funded by Tekes) and Adaptive spatial model for vocal expression: emotional speech synthesis for Finnish (funded by Academy of Finland). The projects were carried out between 2009 and 2012. The speech database was collected in 2011 and 2012. These projects are followed by the international Simple4All speech synthesis project funded by EU.

The purpose of these projects is to create more appropriate speech synthesis options for TTS (Text-To-Speech) conversion applications like assistive communication devices. Speech synthesis products are often generic and their application possibilities are narrow, so there’s a clear need for wider range of synthesis voices and styles. The PERSO corpus consists of single speaker databases with Finnish read and spontaneous speech from 33 men and 33 women, 60 of which are smaller (~ 40 minutes continuous speech/subject) and 6 larger (~ 4 hours of continuous speech/subject) databases. The speech data are packaged with associated text files.

The PERSO corpus will be published at https://lat.csc.fi/ for non-commercial scientific use only.

For detailed information on the license of the resource see https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/ClarinEulaEngACANc.

More information on the corpus: http://blogs.helsinki.fi/phonetics/category/projects/perso/

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

CLARIN ACA - NC

Restrictions: Academic - Non Commercial Use, Attribution, No Redistribution

Distribution Access/Medium: Accessible Through Interface

Execution location: hidden

Attribution Details: Kallio, H. (2013) PERSO Databases for Finnish speech synthesis. Institute of Behavioural Sciences, University of Helsinki.

User Nature: Academic

Licensors:

Martti Vainio

Heini Kallio

Distribution rights holders:

University of Helsinki

CSC - Tieteen tietotekniikan keskus Oy , CSC — IT Center for Science Ltd

IPR Holder

Contact Persons

text
audio

Monolingual text corpusLanguages

Finnish

Linguality

Linguality type: Monolingual

Size

43,000 Utterances

Modalities

Written Language

Monolingual audio corpusLanguages

Finnish

Linguality

Linguality type: Monolingual

Size

60 Hours

43,000 Utterances

Modalities

Spoken Language, Voice

Metadata

Created: 25/06/2013

Last Updated: 25/04/2014

Metadata Language: English (en)

Metadata Creator

Imre Bartis

Usage

Foreseen UseNlp Applications

Use NLP Specific: Tex To Speech Synthesis

People who looked at this resource also viewed the following: