PAROLE Portuguese Annotated Corpus

94 Last view: 2026-06-25

PAROLE Portuguese Annotated Corpus

LE-PAROLE

http://www.clul.ul.pt/en/research-teams/197-le-parole

The PAROLE Portuguese Corpus – tagged subset contains 250.000 tokens and is a subset of the PAROLE Portuguese Corpus of 3 million running words of European Portuguese. The corpus was classified and encoded according to the common core parole encoding standard.
The tagged subset reproduces approximately the whole Corpus distribution by Medium (Newspaper: about 65%, Book: ab. 20%, Periodical: ab. 5%, Miscellaneous: ab. 10%). It has been morpho-syntactically tagged accordingly to the parole common tagset and morpho-syntactic annotation standards. Disambiguation was manually checked.
The corpus was tagged under a collaboration of two Portuguese institutions: the Centre of Linguistics of the University of Lisbon and INESC-ID.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

ELRA END USER, ELRA VAR

Restrictions: Academic - Non Commercial Use

User Nature: Academic

Distribution Access/Medium: CD - ROM

Licensors:

Isabel Trancoso

Amália Mendes

Distribution rights holders:

IPR Holder

Contact Persons

text

Monolingual text corpusLanguages

Portuguese (250,000 Tokens)

Linguality

Linguality type: Monolingual

Size

250,000 Tokens

Monolingual text corpusLanguages

Portuguese (250,000 Tokens)

Language Script: pt-PT

Linguality

Linguality type: Monolingual

Text Format

sgml (250,000 Tokens)

Size

250,000 Tokens

Character encoding

UTF - 8

Domains

general (250,000 Tokens)

Modalities

Written Language

Classification

(25,000 Tokens)

Text type: Miscellaneous

Text genre: Written

(162,500 Tokens)

Text type: Newspaper

Text genre: Written

(50,000 Tokens)

Text type: Book

Text genre: Written

(12,500 Tokens)

Text type: Periodical

Text genre: Written

Time Coverage

1996-1997 (250,000 Tokens)

Geographic coverage

pt (250,000 Tokens)

Creation

Creation mode details: The POS-tagging was done automatically with INESC-ID tool “Palavroso”. The disambiguation was performed by linguists and the pos annotation was manually verified.

Creation mode: Mixed

Creation Tools

PALAVROSO

Resource Creation

Resource Creator

Isabel Trancoso

Maria Fernanda Bacelar do Nascimento

Creation lasted: 01/04/1996 - 31/03/1998

Funding Project

LE-PAROLE (LE-PAROLE)

URLs: http://www.clul.ul.p..., http://www.elda.org/..., ftp://ftp-tei.uic.edu/pub/tei/app/le02.html

Funding Type: Eu Funds

Funders: European Comission - DGXIII, Telematics Application of Common Interest - Contract LE2 - 4017

Funding Country: European Comission

Project duration: 01/04/1996 - 31/03/1998

Metadata

Created: 16/07/2012

Last Updated: 11/12/2015

Metadata Creator

Amália Mendes

Usage

Access tools

http://www.elda.fr/c...

Foreseen UseNlp Applications

Use NLP Specific: Lemmatization, Lexicon Access, Morphosyntactic Tagging, Pos Tagging

Human Use

Use NLP Specific: Linguistic Research

Actual Use - Nlp Applications

Use NLP Specific: Lemmatization, Lexicon Access, Morphosyntactic Tagging, Pos Tagging

Usage Report

Document Type: In Proceedings

Bacelar do Nascimento, M. F., P. Marrafa, L. A. S. Pereira , R. Ribeiro, R. Veloso , L. Wittmann, "LE-PAROLE - Do corpus à modelização da informação lexical num sistema-multifunção", , pp. pp. 115-134. , XIII Encontro Nacional da Associação Portuguesa de Linguística , 1998

Editor: Associação Portuguesa de Linguística

Publisher: Colibri Artes Gráficas

Actual Use - Human Use

Use NLP Specific: Linguistic Research

Usage Report

Document Type: In Proceedings

Editor: Associação Portuguesa de Linguística

Publisher: Colibri Artes Gráficas

Documentation

Document Type: In Proceedings

Editor: Associação Portuguesa de Linguística

Publisher: Colibri Artes Gráficas

People who looked at this resource also viewed the following:

Resources from the same creators