PELCRA Word Aligned Corpora PELCRA_WD_ALIGN
ID: 512 A collection of Polish corpora aligned at the word level using the GIZA++ word aligner. Available both in a TEI P5-compliant format and as relational database logical dump. Sentence-level structural annotation is provided as well as alignment confidence scores. Different parts of this resource are available under different licences - please see the appropriate headers for details. Distribution Availability
Available - Restricted Use
Start date: 15/06/2012
Licence CC - BY - NC
Restrictions: Academic - Non Commercial Use, Attribution
User Nature: Academic, Commercial
Download location: hidden
Distribution Access/Medium: Downloadable
Attribution Details: Pęzik P., Ogrodniczuk M., Przepiórkowski A (2011). Parallel and spoken corpora in an open repository of Polish language resources. Human Language Technologies as a Challenge for Computer Science and Linguistics. LTC Poznań 2011.
Distribution rights holders:
IPR Holder
Contact Person
Bilingual text corpus Languages
Polish
(34,416,872 Words)
Language Script: Latn
English
(40,955,095 Words)
Language Script: Latn
Linguality Linguality type: Bilingual
Multi-linguality type: Parallel (Word level alignments)
Text Format
text/xml
(77,371,967 Words)
Size Character encoding
UTF - 8
Modalities Creation Creation mode: Mixed
Original Sources Creation Tools Resource Creation Creation lasted: 13/06/2012 - 30/06/2012
Funding Project Central and South-East European Resources (CESAR - 271022)
Funding Type: Eu Funds
Funder: DG INFSO of the European Commission
Funding Country: European Union
Project duration: 01/02/2011 - 31/01/2013
Metadata Created: 18/06/2012
Last Updated: 22/01/2013
Metadata Language:
English
(en)
Version Version: 1.0
Last Updated: 04/07/2012
Validation Validated Type of Validation: Formal
Validation Mode: Automatic
Extent: Full
Usage Foreseen Use Nlp Applications Use NLP Specific: Machine Translation
Relation
Relation Type: Source corpora
Documentation Tool Documentation: Online
People who looked at this resource also viewed the following: