Home
Register
Login
Browse Resources
Community
Statistics
Help
About
META-SHARE Members
META-SHARE Repositories
META-SHARE Managing Nodes
LR Sharing
Licensing LRs
Notice and Takedown Policy
Privacy
Data Protection
Data Protection Statement
65
Last view: 2021-06-26
Multilingual News Corpus
MLINGNEWS
A parallel legislative news corpus collected from http://ec.europa.eu/ in English, Romanian and French.
« Back
Download
You don’t have the permission to edit this resource.
Edit Resource
Distribution
Availability
Available - Restricted Use
Licence
MS Commons - BY - NC - ND
Restrictions:
Inform Licensor, No Derivatives, No Redistribution
User Nature:
Academic, Commercial
Distribution Access/Medium:
Accessible Through Interface
Attribution Details:
Please cite this paper: 'Radu Ion, Elena Irimia, Dan Ştefănescu, and Dan Tufiş. ROMBAC: The Romanian Balanced Annotated Corpus. In Proceedings of the 8th International Conference on Language Resources and Evaluation (LREC 2012), pp. 339—344, Istanbul, Turkey, May 21-27, 2012. (C) European Language Resources Association (ELRA), ISBN: 978-2-9517408-7-7.'
Contact Person
Dan Tufiș
http://www.racai.ro/...
Research Institute for Artificial Intelligence, Romanian Academy
RACAI, ICIA
Director of the Research Institute for Artificial Intelligence, Romanian Academy
[javascript protected email address]
Casa Academiei, Calea 13 Septembrie nr. 13, etaj 3, București, România, 050711
050711 Bucharest
Romania
Tel.: 0040 21 3188103
Fax: 0040 21 3188142
NLP Group
http://www.racai.ro/
RACAI, ICIA
Casa Academiei, Calea 13 Septembrie nr. 13, etaj 3, București, România, 050711
050711 Bucharest
Romania
[javascript protected email address]
Tel.: 0040 21 3188103
Fax: 0040 21 3188142
text
Multilingual text corpus
Languages
Romanian (659,031 Tokens)
English (1,334,942 Tokens)
French (1,480,103 Tokens)
Linguality
Linguality type:
Multilingual
Multi-linguality type:
Parallel
Size
3,474,076 Tokens
Character encoding
UTF - 8
Modalities
Written Language
Annotation
Segmentation
StandOff:
False
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
XCES
Annotation Mode:
Automatic
Annotation Tools:
TTL Web Service:
http://ws.racai.ro/t...
Lemmatization
StandOff:
False
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
XCES
Annotation Mode:
Automatic
Annotation Tools:
TTL Web Service:
http://ws.racai.ro/t...
Syntactic Annotation - Shallow Parsing
StandOff:
False
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
XCES
Annotation Mode:
Automatic
Annotation Tools:
TTL Web Service:
http://ws.racai.ro/t...
Morphosyntactic Annotation - Pos Tagging
Tagset:
Morpho-Syntactic Descriptors: http://nl.ijs.si/ME/V4/msd/html/index.html
StandOff:
False
Segmentation level:
Word
Format:
text/xml
Standard practices conformance:
XCES
Theoretic Model:
Hidden Markov Models
Annotation Mode:
Automatic
Annotation Tools:
TTL Web Service:
http://ws.racai.ro/t...
Metadata
Created:
28/11/2011
Last Updated:
01/02/2013
Source:
METANET4U
Documentation
Document Type:
Manual
Radu Ion,
Multilingual News Corpus
,
http://ws.racai.ro:9...
Keywords:
parallel corpus, XCES, English, Romanian, French, news
Document Language:
English
People who looked at this resource also viewed the following:
Multilingual Phrasebank
Multilingual lexicon of toponyms
Multilingual Resource Collection of the University of Helsinki Language Corpus Server
Multilingual Glossary of Synsets