Bilingual text corpus Languages
Romanian
(757,550 Words)
Language Script: Latn
English
(843,832 Words)
Language Script: Latn
Linguality Linguality type: Bilingual
Multi-linguality type: Parallel
Size Character encoding
UTF - 8
Modalities Classification Text type: political
Register: journalistic text
Text genre: news
Annotation Segmentation StandOff: False
Segmentation level: Sentence, Word
Format: http://www.xces.org
Standard practices conformance: XCES
Annotation Mode: Mixed
Lemmatization StandOff: False
Format: http://www.xces.org
Standard practices conformance: XCES
Annotation Mode: Mixed (The corpus has been automatically lemmatized and then manually corrected.)
Morphosyntactic Annotation - Pos Tagging Tagset: http://nl.ijs.si/ME/V3/msd/html/
StandOff: False
Format: http://www.xces.org
Standard practices conformance: XCES
Theoretic Model: http://www.aclweb.org/anthology-new/A/A00/A00-1031.pdf
Annotation Mode: Mixed (The corpus has been automatically POS tagged and then manually corrected.)
Time Coverage
2003