Polish Coreference Corpus

ID:

438

The Polish Coreference Corpus (PL: Polski Korpus Koreferencyjny) is a result of the "Computer-based methods for coreference resolution in Polish texts" project. It contains short fragments (250-350 segments each) of texts randomly selected (preserving the original text type balance) from the full version of the National Corpus of Polish. These fragments are manually annotated with identity coreferential chains and quasi-identity relations. The corpus is supplied in two xml-based formats: MMAX and TEI. It contains automatic morphosyntactic annotation, in TEI format it also has automatic named entity and shallow parsing annotations.

You don’t have the permission to edit this resource.

  • Nerf, a named entity recognizer for Polish
  • Pantera, a Brill tagger for Polish
  • Morfeusz SGJP, a tokenizer, moprhological analyzer and lemmatizer for Polish
  • Pantera, a Brill tagger for Polish
  • Morfeusz SGJP, a tokenizer, moprhological analyzer and lemmatizer for Polish
  • Pantera, a Brill tagger for Polish
  • Spejd, a shallow parser of Polish
  • Spejd, a shallow parser of Polish
  • Spejd, a shallow parser of Polish
  • Nerf, a named entity recognizer for Polish
  • Pantera, a Brill tagger for Polish
  • Morfeusz SGJP, a tokenizer, moprhological analyzer and lemmatizer for Polish