PAROLE Portuguese Corpus - complete version

View resource name in all available languages

Corpus PAROLE du portugais - version complète

ID:

ELRA-W0024_01

The parole Portuguese corpus contains approximately 3 million running words of European Portuguese distributed by Medium, as follows:

* Newspaper: about 65%, covering the period 1996-1997 of 3 titles;
* Book: about 20%, concerning 12 titles from 3 editing houses;
* Periodical: about 5%, concerning 7 weekly issues of 1 title, 1996;
* Miscellaneous: about 10%, concerning several files distributed by 8 titles.

The corpus was classified and encoded according to the common core parole encoding standard. The file format of this corpus is SGML.

A subcorpus of the PAROLE Portuguese Corpus, which reproduces approximately the whole Corpus distribution by Medium (Newspaper: about 65%, Book: ab. 20%, Periodical: ab. 5%, Miscellaneous: ab. 10%) is also available.

It has about 250,000 words morpho-syntactically tagged accordingly to the parole common tagset and morpho-syntactic annotation standards. Disambiguation was manually checked.

View resource description in all available languages

Le corpus PAROLE du portuguais contient environ 3 millions de mots courants du portugais européen, regroupés selon le type de supports :

* Journaux: environ 65%, tirés de 3 titres couvrant la période 1996-1997;
* Ouvrages: environ 20%, tirés de 12 titres de trois maisons d'édition différentes;
* Périodiques: environ 5%, tirés de 7 numéros d'un hebdomadaire, 1996;
* Divers: environ 10%, provenant de 7 fichiers fournis par 8 titres.

Le corpus a été classé et annoté conformément au noyau commun du standard de codage PAROLE. Il est au format SGML.

Un sous-ensemble du corpus PAROLE du portugais en reprend le schéma de distribution par support (Journaux: env. 65%, Ouvrages: env. 20%, Périodiques: env. 5%, divers: env. 10%). Il contient 250 000 mots étiquetés au niveau morpho-syntaxique conformément aux standards d'étiquetage et d'annotation morpho-syntaxique PAROLE. La désambiguïsation a été vérifiée manuellement. Les fichiers sont au format SGML.

You don’t have the permission to edit this resource.