Monolingual text corpus Languages
Polish
Linguality Linguality type: Monolingual
Size Annotation Segmentation Tagset: NKJP tagset
StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic
Start date: 01/04/2011
End date: 30/04/2011
Lemmatization StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic
Start date: 01/04/2011
End date: 30/04/2011
Segmentation StandOff: True
Segmentation level: Sentence
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic
Start date: 01/04/2011
End date: 30/04/2011
Semantic Annotation - Word Senses StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Manual (manually disambiguated using AnotEk)
Start date: 01/04/2011
End date: 19/11/2011
Morphosyntactic Annotation - B Pos Tagging Tagset: NKJP tagset
StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic
Start date: 01/04/2011
End date: 30/04/2011
Morphosyntactic Annotation - Pos Tagging Tagset: NKJP tagset
StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic
Start date: 01/04/2011
End date: 30/04/2011
Segmentation StandOff: True
Segmentation level: Paragraph
Format: text/xml
Standard practices conformance: TEI
Start date: 01/04/2011
End date: 30/04/2011
Creation Creation mode: Mixed
Creation mode details: Economy-related categories from the Polish Wikipedia, including economy-related subcategories, stripped Wikipedia annotations, tagged with TaKIPI 1.8 and converted to TEI format.
Original Sources Creation Tools Java code AnotEk 1.0 TaKIPI 1.8