PAROLE Portuguese Corpus - complete version
View resource name in all available languages
Corpus PAROLE du portugais - version complète
ID:
ELRA-W0024_01
The parole Portuguese corpus contains approximately 3 million running words of European Portuguese distributed by Medium, as follows:
* Newspaper: about 65%, covering the period 1996-1997 of 3 titles;
* Book: about 20%, concerning 12 titles from 3 editing houses;
* Periodical: about 5%, concerning 7 weekly issues of 1 title, 1996;
* Miscellaneous: about 10%, concerning several files distributed by 8 titles.
The corpus was classified and encoded according to the common core parole encoding standard. The file format of this corpus is SGML.
A subcorpus of the PAROLE Portuguese Corpus, which reproduces approximately the whole Corpus distribution by Medium (Newspaper: about 65%, Book: ab. 20%, Periodical: ab. 5%, Miscellaneous: ab. 10%) is also available.
It has about 250,000 words morpho-syntactically tagged accordingly to the parole common tagset and morpho-syntactic annotation standards. Disambiguation was manually checked.
View resource description in all available languages
Le corpus PAROLE du portuguais contient environ 3 millions de mots courants du portugais européen, regroupés selon le type de supports :
* Journaux: environ 65%, tirés de 3 titres couvrant la période 1996-1997;
* Ouvrages: environ 20%, tirés de 12 titres de trois maisons d'édition différentes;
* Périodiques: environ 5%, tirés de 7 numéros d'un hebdomadaire, 1996;
* Divers: environ 10%, provenant de 7 fichiers fournis par 8 titres.
Le corpus a été classé et annoté conformément au noyau commun du standard de codage PAROLE. Il est au format SGML.
Un sous-ensemble du corpus PAROLE du portugais en reprend le schéma de distribution par support (Journaux: env. 65%, Ouvrages: env. 20%, Périodiques: env. 5%, divers: env. 10%). Il contient 250 000 mots étiquetés au niveau morpho-syntaxique conformément aux standards d'étiquetage et d'annotation morpho-syntaxique PAROLE. La désambiguïsation a été vérifiée manuellement. Les fichiers sont au format SGML.
People who looked at this resource also viewed the following: