ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1

31 Last view: 2026-05-22

ECPC Corpus (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – set 1

View resource name in all available languages

Corpus ECPC (European Comparable and Parallel Corpora of Parliamentary Speeches Archive) – ensemble 1

http://catalog.elra.info/product_info.php?products_id=1329

ID:

ELRA-W0128

The European Comparable and Parallel Corpora of Parliamentary Speeches Archive (ECPC), compiled at the Universitat Jaume I (Spain), is a collection of XML metatextually tagged corpora containing speeches from three European chambers (the European Parliament, the British House of Commons, and the Spanish Congreso de los Diputados). It is a bilingual, bidirectional written corpus in English and Spanish described by Zanettin (2012). This first set (ECPC_EP-05) consists of (1) a "clean" version in XML of European Parliament's 2005 daily sessions; (2) a POS-tagged version of the 2005 daily sessions; and (3) a sentence-based aligned version of 2005 daily sessions. In its raw format, ECPC_EP-05 contains 3,668,476 tokens/words (excluding tagging) in English distributed over 60 utf-8 files and 3,993,867 tokens/words (excluding tagging) in Spanish distributed over 60 utf-8 files.

View resource description in all available languages

Le Corpus ECPC (European Comparable and Parallel Corpora of Parliamentary Speeches Archive), compilé à l’Universitat Jaume I (Espagne), est une collection de corpus taggés au niveau métatextuel en XML et contenant des discours de trois chambres européennes (le Parlement européen, la Chambre des communes britannique et le Congrès des députés espagnol). C’est un corpus écrit bilingue, bidirectionnel en anglais et en espagnol, tel que décrit par Zanettin (2012). Ce premier ensemble (ECPC_EP-05) est composé des sessions journalières 2005 du Parlement européen avec (1) une version "nettoyée" en XML, (2) une version étiquetée en partie du discours, et (3) une version alignée au niveau des phrases. Dans son format brut, ECPC_EP-05 contient 3,668,476 tokens/mots (en excluant l’étiquetage) en anglais répartis sur plus de 60 fichiers en utf-8 et 3,993,867 tokens/words (en excluant l’étiquetage) en espagnol répartis sur plus de 60 fichiers en utf-8.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 21/12/2018

Licence

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Commercial

Contact Person

Mapelli Valérie

text

Monolingual text corpusLanguages

Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

English

Linguality

Linguality type: Monolingual

Multi-linguality type: Parallel

Text Format

Plain text

Size

no size available

Metadata

Created: 12/05/2005

Version

Version: 1.0

Last Updated: 21/12/2018

People who looked at this resource also viewed the following: