Home
Register
Login
Browse Resources
Community
Statistics
Help
About
META-SHARE Members
META-SHARE Repositories
META-SHARE Managing Nodes
LR Sharing
Licensing LRs
Notice and Takedown Policy
Privacy
Data Protection
Data Protection Statement
67
Last view: 2024-07-07
gpwEcono corpus
gpwEcono
http://zil.ipipan.waw.pl/gpwEcono
ID:
451
A corpus of Polish language stock market reports, with manual annotation on the word sense layer and automatic morphosyntactic annotation, TEI format.
« Back
Download
You don’t have the permission to edit this resource.
Edit Resource
Distribution
Availability
Available - Unrestricted Use
Licence
GPL
Restrictions:
Share Alike
Fee:
free of charge
Download location:
hidden
Distribution Access/Medium:
Downloadable
Contact Person
Łukasz Kobyliński
http://zil.ipipan.wa...
Research Assistant
[javascript protected email address]
Jana Kazimierza 5
01-248 Warsaw
Tel.: +48 22 38 00 559
Fax: +48 22 38 00 510
text
Monolingual text corpus
Languages
Polish
Linguality
Linguality type:
Monolingual
Size
282,366 Words
20 Mb
Annotation
Segmentation
Tagset:
NKJP tagset
StandOff:
True
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
TEI
Annotation Mode:
Automatic
Annotation Tools:
TaKIPI 1.8
Start date:
01/02/2010
End date:
28/02/2010
Lemmatization
StandOff:
True
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
TEI
Annotation Mode:
Automatic
Annotation Tools:
TaKIPI 1.8
Start date:
01/02/2010
End date:
28/02/2010
Segmentation
StandOff:
True
Segmentation level:
Sentence
Format:
text/xml
Standard practices conformance:
TEI
Annotation Mode:
Automatic
Annotation Tools:
TaKIPI 1.8
Start date:
01/02/2010
End date:
28/02/2010
Semantic Annotation - Word Senses
StandOff:
True
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
TEI
Annotation Mode:
Manual (manually disambiguated using AnotEk)
Annotation Tools:
AnotEk
Start date:
01/06/2010
End date:
01/09/2010
Morphosyntactic Annotation - B Pos Tagging
Tagset:
NKJP tagset
StandOff:
True
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
TEI
Annotation Mode:
Automatic
Annotation Tools:
TaKIPI 1.8
Start date:
01/02/2010
End date:
28/02/2010
Morphosyntactic Annotation - Pos Tagging
Tagset:
NKJP tagset
StandOff:
True
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
TEI
Annotation Mode:
Automatic
Annotation Tools:
TaKIPI 1.8
Start date:
01/02/2010
End date:
28/02/2010
Segmentation
StandOff:
True
Segmentation level:
Paragraph
Format:
text/xml
Standard practices conformance:
TEI
Start date:
01/02/2010
End date:
28/02/2010
Creation
Creation mode:
Mixed
Creation mode details:
The corpus has been created from stock market reports. Morphosyntactic annotation has been done using the TaKIPI tagger. AnotEk was used for manual word sense annotation.
Original Sources
http://www.gpwinfost...
Creation Tools
Java code
AnotEk 1.0
TaKIPI 1.8
Metadata
Created:
23/01/2013
Last Updated:
23/01/2013
Source:
CESAR
Metadata Creator
Łukasz Kobyliński
http://zil.ipipan.wa...
Research Assistant
[javascript protected email address]
Jana Kazimierza 5
01-248 Warsaw
Tel.: +48 22 38 00 559
Fax: +48 22 38 00 510
Version
Version:
1.0
People who looked at this resource also viewed the following:
GNU Aspell
GrAF IULA tagger Web Service
Glossary of Terms Related to Audit in Public Administration
Glossary of Terms and Concepts in the Field of Education