The PTPARL Corpus contains approximately 975,806 running words of European Portuguese. It includes 1076 texts consisting of adapted transcriptions of the Portuguese parliament sessions, which were made available in 2004.
Généreux, M, I. Hendrickx, A. Mendes,
A Large Portuguese Corpus On-Line: Cleaning and Preprocessing,
http://www.propor201...
, pp. 113-120
, 10th International Conference PROPOR2012
, 2012
Editor: Caseli, H. et al. (eds.)
Publisher: Heidelberg: Springer-Verlag
Keywords: Corpus cleaning, PoS Tagging, Lemmatization
Document Language:
English
Lemmatization
Segmentation level: Word
Annotation Mode: Automatic
Size:
975,806 Tokens
Annotation Manual:
Document Type: In Proceedings
Généreux, M, I. Hendrickx, A. Mendes,
A Large Portuguese Corpus On-Line: Cleaning and Preprocessing,
http://www.propor201...
, pp. 113-120
, 10th International Conference PROPOR2012
, 2012
Editor: Caseli, H. et al. (eds.)
Publisher: Heidelberg: Springer-Verlag
Keywords: Corpus cleaning, PoS Tagging, Lemmatization
Généreux, M, I. Hendrickx, A. Mendes,
A Large Portuguese Corpus On-Line: Cleaning and Preprocessing,
http://www.propor201...
, pp. 113-120
, 10th International Conference PROPOR2012
, 2012
Editor: Caseli, H. et al. (eds.)
Publisher: Heidelberg: Springer-Verlag
Keywords: Corpus cleaning, PoS Tagging, Lemmatization
Document Language:
English
People who looked at this resource also viewed the following: