Bilingual text corpus Languages
English
(387,000 Tokens)
Polish
(323,000 Tokens)
Linguality Linguality type: Bilingual
Multi-linguality type: Parallel (The texts were manually aligned on a sentence level using the MemoQ semgent alignment tool (http://kilgray.com/products/memoq).)
Text Format
text/xml
(710,000 Tokens)
Size Character encoding
UTF - 8
(710,000 Tokens)
Domains Modalities Annotation Segmentation Segmentation level: Sentence
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Mixed
Start date: 01/08/2011
End date: 30/09/2011
Size:
710,000 Tokens
Alignment Segmentation level: Sentence
Format: text/xml
Standard practices conformance: TEI
Annotation Mode: Manual
Start date: 01/08/2011
End date: 30/09/2011
Size:
710,000 Tokens
Time Coverage
2005-2010
(710,000 Tokens)
Creation Creation mode details: Semi-automatic acquisition and processing.
Creation mode: Mixed
Original Sources Creation Tools