TC-STAR English-Spanish Training Corpora for Machine Translation: Aligned Final Text Editions of EPPS

17 Last view: 2026-07-04

TC-STAR English-Spanish Training Corpora for Machine Translation: Aligned Final Text Editions of EPPS

View resource name in all available languages

Corpus d’entraînement TC-STAR anglais-espagnol pour la traduction automatique: Editions du texte final aligné EPPS

http://catalog.elra.info/product_info.php?products_id=1033

ID:

ELRA-S0250

TC-STAR is a European integrated project focusing on all core technologies for Speech-to-Speech Translation (SST): Automatic Speech Recognition (ASR), Spoken Language Translation (SLT), and Text to Speech Synthesis (TTS).

This corpus consists of respectively 34 million (English) and 38 million (Spanish) running words of bilingual sentence segmented and aligned texts in English and Spanish obtained from the Final Text Editions provided by the European Parliament (http://www.europarl.europa.eu) from April 1996 to Sept. 2004, Dec. 2004 to May 2005, and Dec. 2005 to May 2006. The data is accompanied by tools for further preprocessing.

View resource description in all available languages

TC-STAR est un projet intégré européen dédié à toutes les technologies de base pour la traduction parole-parole (ou SST pour Speech-to-Speech Translation): reconnaissance automatique de la parole (ou ASR pour « Automatic Speech Recognition »), traduction de la langue parlée (ou SLT pour « Spoken Language Translation ») et technologies texte-parole (ou TTS pour « Text-to-Speech »).

Ce corpus comprend 34 millions de mots (anglais) et 38 millions de mots (espagnol) de textes segmentés et alignés composés de phrases bilingues anglais-espagnol issues des Editions de textes finales et fournies par le Parlement européen (http://www.europarl.europa.eu) d’avril 1996 à septembre 2004, de décembre 2004 à mai 2005, et de décembre 2005 à mai 2006. Les données sont accompagnées d’outils permettant leur pré-traitement.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 15/11/2007

Licence

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

User Nature: Commercial

Contact Person

Mapelli Valérie

text
audio

Lexical Conceptual Resource General Information

Lexicon

Monolingual text lexicalConceptualResourceLanguages

Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

English

Linguality

Linguality type: Monolingual

Size

no size available

Monolingual audio lexicalConceptualResourceLanguages

Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

English

Linguality

Linguality type: Monolingual

Size

no size available

Resource Creation

Funding Project

TC-STAR

Funding Type: Eu Funds

Metadata

Created: 12/05/2005

Version

Version: 1.0

Last Updated: 15/11/2007

People who looked at this resource also viewed the following:

Resources from the same project