European Parliament Interpretation Corpus (EPIC) – META-SHARE

Last view: 2025-12-19

76 Last view: 2025-12-19

European Parliament Interpretation Corpus (EPIC)

View resource name in all available languages

Corpus d’interprétation du Parlement européen Interpretation Corpus (EPIC)

http://catalog.elra.info/product_info.php?products_id=1145

ID:

ELRA-S0323

The EPIC corpus is a parallel corpus of European Parliament speeches and their corresponding simultaneous interpretations. This corpus includes source speeches in Italian, English and Spanish and interpreted speeches in all possible combinations and directions (from English into Italian and Spanish; from Italian into English and Spanish; and from Spanish into Italian and English). It contains a total of 357 speeches (177,295 words).

The EPIC corpus includes video clips of each source language speaker, audio clips of the corresponding interpreted target speeches and transcripts of all the clips. The corpus has been orthographically transcribed. Annotation includes paralinguistic features (truncated, mispronounced words, ...) and metadata (a header at the beginning of each transcript and information about the speaker and the speech). The transcripts are POS (part-of-speech) tagged and lemmatised. Non-tagged transcripts in text format are also available.

Size of the nine subcorpora in the EPIC corpus:

sub-corpus / number of speeches / total word count / % of EPIC
ORG-EN (source) / 81 / 42,705 / 25
INT-EN-IT (interpretation) / 81 / 35,765 / 20
INT-EN-ES (interpretation) / 81 / 38,066 / 21
ORG-IT (source) / 17 / 6,765 / 4
INT-IT-EN (interpretation) / 17 / 6,708 / 4
INT-IT-ES (interpretation) / 17 / 7,052 / 4
ORG-ES (source) / 21 / 14,406 / 8
INT-ES-IT (interpretation) / 21 / 12,833 / 7
INT-ES-EN (interpretation) / 21 / 12,995 / 7
TOTAL / 357 / 177,295 / 100

The EPIC corpus was developed by a multidisciplinary research group based at the Department of Interdisciplinary Studies in Translation, Languages and Cultures (University of Bologna at Forlì), involving interpreting scholars, corpus linguists and IT technicians: Mariachiara Russo (coordinator), Claudio Bendazzoli, Cristina Monti, Annalisa Sandrelli, Marco Baroni, Silvia Bernardini, Gabriele Mack, Lorenzo Piccioni, Eros Zanchetta, Elio Ballardini, Peter Mead.

View resource description in all available languages

Le corpus EPIC corpus est un corpus parallèle contenant des discours du Parlement européen et leurs interprétations simultanées. Ce corpus comprend les discours source en italien, anglais et espagnol, et leurs interprétations dans toutes les directions et combinaisons possibles (de l’anglais vers l’italien et l’anglais; de l’italien vers l’anglais et l’espagnol; et de l’espagnol vers l’italien et l’anglais). Il contient un total de 357 discours (177,295 mots).

Le corpus EPIC contient les fichiers vidéo de chaque locuteur en langue source, les fichiers audio de leurs interprétations, et les transcriptions correspondantes. Le corpus a été retranscrit orthographiquement. L’annotation inclut les caractéristiques paralinguistiques (mots tronqués, mal prononcés, ...) et les métadonnées (titre au début de chaque transcription et informations sur le locuteur et le discours). Les transcriptions ont été annotées avec des étiquettes indiquant la partie du discours (POS) et les lemmes. Les transcriptions non annotées au format texte sont également disponibles.

Taille des neuf sous-corpus du corpus EPIC:

sous-corpus / nombre de discours nombre total de mots % d’EPIC
ORG-EN (source) / 81 / 42,705 / 25
INT-EN-IT (interprétation) / 81 / 35,765 / 20
INT-EN-ES (interprétation) / 81 / 38,066 / 21
ORG-IT (source) / 17 / 6,765 / 4
INT-IT-EN (interprétation) / 17 / 6,708 / 4
INT-IT-ES (interprétation) / 17 / 7,052 / 4
ORG-ES (source) / 21 / 14,406 / 8
INT-ES-IT (interprétation) / 21 / 12,833 / 7
INT-ES-EN (interprétation) / 21 / 12,995 / 7
TOTAL / 357 / 177,295 / 100

Le corpus EPIC a été développé par une équipe de recherche multidisciplinaire du département ‘Interdisciplinary Studies in Translation, Languages and Cultures’ (Université de Bologne à Forlì), impliquant des étudiants en interprétation, des linguistes et des techniciens informatiques: Mariachiara Russo (coordinatrice), Claudio Bendazzoli, Cristina Monti, Annalisa Sandrelli, Marco Baroni, Silvia Bernardini, Gabriele Mack, Lorenzo Piccioni, Eros Zanchetta, Elio Ballardini, Peter Mead.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 22/11/2011

Licence

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 0.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 0.00

User Nature: Academic

Contact Person

Mapelli Valérie

audio
video

Monolingual audio corpusLanguages

Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

English Italian English English Italian Italian Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

Italian English

Linguality

Linguality type: Monolingual

Size

no size available

AnnotationOther

Monolingual video corpusLanguages

Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

English Italian English English Italian Italian Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

Spanish

Variety: Castilian (Type: Dialect) (2 Gb)

Italian English

Linguality

Linguality type: Monolingual

Size

no size available

Metadata

Created: 12/05/2005

Version

Version: 1.0

Last Updated: 23/11/2011

Usage

Actual Use - Nlp Applications

Use NLP Specific: Speech Recognition

People who looked at this resource also viewed the following: