The Polish Coreference Corpus (PL: Polski Korpus Koreferencyjny) is a result of the "Computer-based methods for coreference resolution in Polish texts" project. It contains short fragments (250-350 segments each) of texts randomly selected (preserving the original text type balance) from the full version of the National Corpus of Polish. These fragments are manually annotated with identity coreferential chains and quasi-identity relations. The corpus is supplied in two xml-based formats: MMAX and TEI. It contains automatic morphosyntactic annotation, in TEI format it also has automatic named entity and shallow parsing annotations.
Morfeusz SGJP, a tokenizer, moprhological analyzer and lemmatizer for Polish
Pantera, a Brill tagger for Polish
StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic (MSD and POS tag variants (all available morphosyntactic interpretations) output by Morfeusz, then disambiguated by Pantera tagger)
Annotation Tools:
Morfeusz SGJP, a tokenizer, moprhological analyzer and lemmatizer for Polish
Pantera, a Brill tagger for Polish
Start date: 01/01/2012
Semantic Annotation - Entity Mentions
StandOff: True
Segmentation level: Word Group
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Mixed (manual annotation with automatic preannotation)
Start date: 01/01/2012
Discourse Annotation - Coreference
StandOff: True
Segmentation level: Other
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Mixed (manual annotation with automatic preannotation)
Start date: 01/01/2012
Structural Annotation
StandOff: True
Segmentation level: Word
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic (syntactic words (word-like compounds) detected by Spejd with NKJP shallow parsing grammar; see NKJP documentation for details)
Annotation Tools:
Spejd, a shallow parser of Polish
Start date: 01/01/2012
Syntactic Annotation - Shallow Parsing
StandOff: True
Segmentation level: Word Group
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Automatic (syntactic groups (phrase-like constructs) detected by Spejd with NKJP shallow parsing grammar; see NKJP documentation for details)