ILSP/ELEFTHEROTYPIA Corpus (Greek corpus) – META-SHARE

Last view: 2026-02-10

69 Last view: 2026-02-10

ILSP/ELEFTHEROTYPIA Corpus (Greek corpus)

View resource name in all available languages

Corpus ILSP/ELEFTHEROTYPIA (corpus grec)

http://catalog.elra.info/product_info.php?products_id=763

ID:

ELRA-W0022

The ILSP/ELEFTHEROTYPIA Corpus contains approximately 3 million words classified and annotated according to the common core PAROLE encoding standard. Thus, each file is classified according to the parameters of Medium, Topic and Genre, and structurally annotated at paragraph level (CES Level 1). The format of the corpus is SGML files. The source of the files is the Greek daily newspaper ELEFTHEROTYPIA.

A subset of the corpus (250,000 words) is morpho-syntactically tagged; all the words are also lemmatised and checked. For the morphosyntactic annotation of the corpus, a stepwise procedure consisting of the following four steps was used: automatic morphosyntactic annotation, automatic disambiguation, manual disambiguation and checking, conversion into the PAROLE format requirements. In certain texts, some passages are written in "katharevoussa", an older version of Greek; these passages are marked as "distinct" and have not been morpho-syntactically annotated.

The tagset used for the morphological annotation of the corpus is presented in the "Addendum to TA - Encoding features and values for the morphological layer in the lexicon Merged Tags" (P-WP1.1.-MEMO-ERLI-5).

More information about the PAROLE project: http://www.elda.org/catalogue/fr/text/doc/parole.html

View resource description in all available languages

Le corpus ILSP/ELEFTHEROTYPIA (corpus PAROLE grec) comprend environ 3 millions de mots classés et annotés conformément au noyau commun du standard de codage PAROLE. Chaque fichier a été classé selon les paramètres Source, Thème et Genre, et annoté au niveau du paragraphe (Niveau 1 du CES). Le format du corpus est SGML. La source des fichiers est le quotidien grec ELEFTHEROTYPIA.

Un sous-ensemble du corpus (250 000 mots) a été étiqueté au niveau morpho-syntaxique ; tous les mots ont été lemmatisés et vérifiés. L'annotation morpho-syntaxique du corpus a été réalisée en 4 étapes : annotation morpho-syntaxique automatique, désambiguïsation automatique, désambiguïsation et vérification manuelle, conversion vers le format PAROLE. Dans certains textes, quelques passages écrits en "katharevoussa", une ancienne écriture du grec, ont été marqués par la balise "distinct" et n'ont pas été annotés au niveau morpho-syntaxique.

Le jeu d'étiquettes utilisé pour l'annotation morphologique de ce corpus est présenté dans le document intitulé "Addendum to TA - Encoding features and values for the morphological layer in the lexicon Merged Tags" (P-WP1.1.-MEMO-ERLI-5).

Plus d'informations sur le projet Parole: http://www.elda.org/catalogue/fr/text/doc/parole.html

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 09/03/2000

Licence

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 850.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 850.00

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 1,275.00

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 1,275.00

User Nature: Academic

Contact Person

Mapelli Valérie

text

Monolingual text corpusLanguages

Greek, Modern (1453-)

Linguality

Linguality type: Monolingual

Size

no size available

Resource Creation

Funding Project

PAROLE

Funding Type: Eu Funds

Metadata

Created: 12/05/2005

Version

Version: 1.0

Last Updated: 12/05/2004

People who looked at this resource also viewed the following:

Resources from the same project