INTERA Corpus - the English structurally annotated part of the BG-EN pair

83 Last view: 2026-03-18

3 Last update: 2016-01-08

2 Last download: 2016-03-21

INTERA Corpus - the English structurally annotated part of the BG-EN pair

The English part of the BG-EN pair of the INTERA corpus; written, domain specific (law, education); (1 MWs); XCES format.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

CC - BY

Restrictions: Attribution

Distribution Access/Medium: Downloadable

Attribution Details: The INTERA Corpus - the Bulgarian part of the BG-EN pair of the ILSP/RC Athena licensed under CC-BY as accessed via META-SHARE

Contact Person

Maria Gavrilidou

text

Monolingual text corpusLanguages

English (1,000,000 Words)

Linguality

Linguality type: Monolingual

Text Format

application/x-xces+xml

Size

1,000,000 Words

Character encoding

UTF - 8

Domains

law

education

Modalities

Written Language

AnnotationStructural Annotation

StandOff: False

Segmentation level: Sentence

Format: application/x-xces+xml

Standard practices conformance: XCES

Creation

Creation mode details: web crawling; manual selection; semi-automatic conversion to the desired formats

Creation mode: Mixed

Original Sources

various texts found mainly over the internet

Resource Creation

Creation lasted: 01/01/2003 - 31/12/2004

Funding Project

Integrated European language data Repository Area (INTERA - e-content EDC-22076 INTERA / 27924)

URL: http://www.elda.org/...

Funding Type: Eu Funds

Funder: eContent

Project duration: 01/01/2003 - 31/12/2004

Metadata

Created: 02/02/2012

Last Updated: 08/01/2016

Usage

Foreseen UseNlp Applications

Use NLP Specific: Machine Translation

Actual Use - Nlp Applications

Use NLP Specific: Terminology Extraction

Relation

Related Resource: INTERA corpus

Relation Type: isPartOf

Documentation

Document Type: In Proceedings

Maria Gavrilidou and Penny Labropoulou and Elina Desipri and Voula Giouli et al, Building parallel corpora for eContent professionals, , COLING 2004 , 2004

Book Title: Proceedings of COLING 2004

Document Type: In Proceedings

Maria Gavrilidou and Penny Labropoulou and Stelios Piperidis et al, Language resources production models: the case of INTERA multilingual corpus and terminology, , 5th International Conference on Language Resources and Evaluation (LREC-2006) , 2006

Book Title: Porceedings of the 5th International Conference on Language Resources and Evaluation (LREC-2006)

Document Type: In Proceedings

Maria Gavrilidou and Penny Labropoulou and Monica Monachini and Stelios Piperidis and Claudia Soria, Building Multilingual Terminological Resources, , RANLP 2005 International Workshop on Language and Speech Infrastructure for Information Access in the Balkan Countries , 2005

Book Title: Proceedings of the RANLP 2005 International Workshop on Language and Speech Infrastructure for Information Access in the Balkan Countries

Document Type: Tech Report

Maria Gavrilidou and Voula Giouli and Elina Desipri and Penny Labropoulou and Monica Monachini et al, D5.2 - Report on the multilingual resources production, http://www.elda.org/... , 2004

People who looked at this resource also viewed the following:

People who downloaded this resource also downloaded the following:

Resources from the same project