Home
Register
Login
Browse Resources
Community
Statistics
Help
User Manual (Old version)
META-SHARE Portal
About
META-SHARE Members
META-SHARE Repositories
META-SHARE Managing Nodes
LR Sharing
Licensing LRs
Notice and Takedown Policy
Privacy
Data Protection
Data Protection Statement
5
Last view: 2026-04-03
SemCor
SemCor
SemCor corpus in English and Romanian.
« Back
Download
You don’t have the permission to edit this resource.
Edit Resource
Distribution
Availability:
Available
Licences
Non Standard Licence Terms
Conditions:
Inform Licensor, No Derivatives, No Redistribution
Distribution Details
User Nature:
Academic, Commercial
Distribution Access/Medium:
Accessible Through Interface
Contact Person
Dan Tufiş
http://www.racai.ro/...
Research Institute for Artificial Intelligence, Romanian Academy
RACAI
Director of the Research Institute for Artificial Intelligence, Romanian Academy
[javascript protected email address]
Casa Academiei, Calea 13 Septembrie nr. 13, etaj 3, Bucureşti, România, 050711
050711 Bucharest
Romania (RO)
Tel.: 0040 21 3188103
Fax: 0040 21 3188142
http://www.racai.ro/
RACAI
Casa Academiei, Calea 13 Septembrie nr. 13, etaj 3, Bucureşti, România, 050711
050711 Bucharest
Romania
[javascript protected email address]
Tel.: 0040 21 3188103
Fax: 0040 21 3188142
text
Bilingual text corpus
Languages
Romanian; Moldavian; Moldovan (175,603 Tokens)
Language Script:
Latin
English (178,499 Tokens)
Language Script:
Latin
Linguality
Linguality type:
Bilingual
Multi-linguality type:
Parallel
Size
354,102 Tokens
Character encoding
UTF - 8
Modalities
Written Language
Annotation
Segmentation
StandOff:
False
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
XCES
Annotation Mode:
Automatic
Annotation Tools:
TTL Web Service:
http://ws.racai.ro/t...
Lemmatization
StandOff:
False
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
XCES
Annotation Mode:
Automatic
Annotation Tools:
TTL Web Service:
http://ws.racai.ro/t...
Syntactic Annotation - Constituency Trees
StandOff:
False
Segmentation level:
Word
Format:
text/xml
Annotation Mode:
Automatic
Annotation Tools:
TTL Web Service:
http://ws.racai.ro/t...
Morphosyntactic Annotation - Pos Tagging
Tagset:
Morpho-Syntactic Descriptors: http://nl.ijs.si/ME/V4/msd/html/index.html
StandOff:
False
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
XCES
Theoretic Model:
Hidden Markov Models
Annotation Mode:
Automatic
Annotation Tools:
TTL Web Service:
http://ws.racai.ro/t...
Semantic Annotation - Word Senses
StandOff:
False
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
XCES
Annotation Mode:
Manual
Annotation Tools:
WSD Tool
Syntacticosemantic Annotation - Links
StandOff:
False
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
XCES
Theoretic Model:
Lexical Attraction Models
Annotation Mode:
Automatic
Annotation Tools:
LexPar
Metadata
Created:
28/11/2011
Last Updated:
01/02/2013
Source:
METANET4U
Documentation
Document Type:
Manual
RACAI,
SemCor Corpus
,
http://ws.racai.ro:9...
Keywords:
SemCor corpus, word sense disambiguation, POS-tagged, lemmatize, chunked, linked
Document Language:
English