HESITA – META-SHARE

Last view: 2026-03-18

86 Last view: 2026-03-18

HESITA

ID:

hesita-id

The HESITA Corpus is composed of the audio and the manual transcriptions of HESITAtion events from browdcast news in European Portuguese. The corpus includes the audio signal of 30 daily news programs collected in september 2011 from a Portuguese television channel podcast. The audio was downsampled from 44.1 kHz to 16 kHz samplig rate and the video information was discarded. The corpus contains a total of 27 hours of audio and speech in which acoustical environment conditions and hesitations were manually transcribed by several trained annotators. The audio material contains studio and out of studio recordings and sessions recorded from the telephone. It comprises speech (which may occur over bakground speech, noise and music) as well as non speech events (music, jingles, laughter, coughing or clapping). Prepared (read) speaking style is dominant. For a more complete description of the corpus and the report of automatic characterization of the hesitation events, the reader may refer to (Veiga et al., 2012a and 2012b), (Veiga et al., 2011) and (Candeias et al., 2013).

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Unrestricted Use

Licence

CC - BY - NC - SA

Restrictions: Share Alike

Download location: hidden

Distribution Access/Medium: Downloadable

Contact Person

text
audio

Monolingual text corpusLanguages

Portuguese

Linguality

Linguality type: Monolingual

Size

58 Files

Monolingual audio corpusLanguages

Portuguese

Linguality

Linguality type: Monolingual

Size

58 Files

Effective speech duration

27 Hours

Metadata

Created: 28/01/2013

Last Updated: 28/01/2013

Metadata Language: English (en)

Metadata Creator

ValidationValidated

Type of Validation: Content

Documentation

Document Type: Unpublished

Sara Candeias, Narrative description, https://www.l2f.ines... , 2013

People who looked at this resource also viewed the following: