TC-STAR English Training Corpora for ASR: Recordings of EPPS Speech

11 Last view: 2026-07-05

TC-STAR English Training Corpora for ASR: Recordings of EPPS Speech

View resource name in all available languages

Corpus d’entraînement TC-STAR anglais pour l’ASR: Enregistrements EPPS

http://catalog.elra.info/product_info.php?products_id=1034,

http://catalog.elra.info/product_info.php?products_id=1035

ID:

ELRA-S0251

TC-STAR is a European integrated project focusing on all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT), and Text to Speech Synthesis (TTS).

This corpus consists of the recordings of around 290 hours from EPPS (European Parliament Plenary Sessions) speeches held or interpreted in European English (a mixture of native and non-native English), 92 hours of which were annotated (transcribed) (the transcriptions are not included in the present package). These recordings were obtained from Europe by Satellite (http://europa.eu.it/comm/ebs) from May 2004 until May 2006.

The speech signals were submitted by EbS via internet in Real Media format and via satellite in MPEG1-layer2 format. The signals were decoded, resampled and are stored in WAVE RIFF (Resource Interchange File Format). Each file contains a single channel with 16-bit resolution at a sample rate of 16kHz.

The speech databases made within the TC-STAR project were validated by SPEX, in the Netherlands, to assess their compliance with the TC-STAR format and content specifications.

For corresponding transcriptions, see ELRA-S0249.

View resource description in all available languages

TC-STAR est un projet intégré européen dédié à toutes les technologies de base pour la traduction parole-parole (ou SST pour Speech-to-Speech Translation): reconnaissance automatique de la parole (ou ASR pour « Automatic Speech Recognition »), traduction de la langue parlée (ou SLT pour « Spoken Language Translation ») et technologies texte-parole (ou TTS pour « Text-to-Speech »).

Ce corpus comprend les enregistrements d’environ 290 heures de discours réalisés ou interprétés en anglais européen (un mélange d’anglais natif et non natif) durant les sessions plénières du Parlement européen (EPPS), et dont 92 heures ont été annotées (transcrites) (les transcriptions ne sont pas incluses dans le présent package). Les enregistrements ont été obtenus via Europe by Satellite (http://europa.eu.it/comm/ebs) de mai 2004 à mai 2006.

Les signaux de parole ont été soumis par EbS via internet au format Real Media et via satellite au format MPEG1-layer2. Les signaux ont été décodés, ré-échantillonnés et stockés en WAVE RIFF (Resource Interchange File Format). Chaque fichier contient un seul canal d’une résolution de 16-bit à un taux d’échantillonnage de 16kHz.

Les bases de données orales produites dans le projet TC-STAR ont été validées par SPEX, Pays-Bas, selon le format et les spécifications de contenu TC-STAR.

Pour les transcriptions correspondantes, voir ELRA-S0249.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 15/11/2007

Licence

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 600.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 520.00

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 800.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 800.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 800.00

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 400.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 600.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 600.00

User Nature: Commercial

Contact Person

Mapelli Valérie

audio

Monolingual audio corpusLanguages

English

Linguality

Linguality type: Monolingual

Size

no size available

Resource Creation

Funding Project

TC-STAR

Funding Type: Eu Funds

Metadata

Created: 12/05/2005

Version

Version: 1.0

Last Updated: 15/11/2007

Usage

Actual Use - Nlp Applications

Use NLP Specific: Speech Recognition

People who looked at this resource also viewed the following:

Resources from the same project