Pęzik P., Ogrodniczuk M., Przepiórkowski A (2011). Parallel and spoken corpora in an open repository of Polish language resources. Human Language Technologies as a Challenge for Computer Science and Linguistics. LTC Poznań 2011.
ID:
512
A collection of Polish corpora aligned at the word level using the GIZA++ word aligner. Available both in a TEI P5-compliant format and as relational database logical dump. Sentence-level structural annotation is provided as well as alignment confidence scores. Different parts of this resource are available under different licences - please see the appropriate headers for details.