Home
Register
Login
Browse Resources
Community
Statistics
Help
User Manual (Old version)
META-SHARE Portal
About
META-SHARE Members
META-SHARE Repositories
META-SHARE Managing Nodes
LR Sharing
Licensing LRs
Notice and Takedown Policy
Privacy
Data Protection
Data Protection Statement
9
Last view: 2018-01-29
ACCURAT balanced test corpus for under resourced languages
Collection of paralell senteces for seven under resourced languages: Croatian, Estonian, Greek, Latvian, Lithuanian, Romanian and Slovenian, and English and German
« Back
Download
You don’t have the permission to edit this resource.
Edit Resource
Distribution
Availability:
Available
Licences
CC - BY 4.0
Distribution Details
Distribution Access/Medium:
Downloadable
Contact Person
Aivars Bērziņš
project manager
[javascript protected email address]
Vienibas gatve 75a, Riga
LV 1004 Riga
Latvia (LV)
Tel.: +371-67605001
Fax: +371-67605750
text
Multilingual text corpus
Languages
Modern Greek (1453 - ) (512 Sentences)
Language Script:
Greek
Slovenian (512 Sentences)
Language Script:
Latin
Romanian; Moldavian; Moldovan (512 Sentences)
Language Script:
Latin
Latvian (512 Sentences)
Language Script:
Latin
English (512 Sentences)
Language Script:
Latin
Estonian (512 Sentences)
Language Script:
Latin
Croatian (512 Sentences)
Language Script:
Latin
German (512 Sentences)
Language Script:
Latin
Lithuanian (512 Sentences)
Language Script:
Latin
Linguality
Linguality type:
Multilingual
Multi-linguality type:
Parallel
Size
4,608 Sentences
Character encoding
UTF - 8
Modalities
Written Language
Creation
Creation mode:
Manual
Metadata
Created:
07/09/2012
Last Updated:
05/11/2015
Version
Version:
final
Last Updated:
31/12/2010
Validation
Validated
Usage
Foreseen Use
Nlp Applications
Use specific to NLP:
Machine Translation
Actual Use - Nlp Applications
Use specific to NLP:
Machine Translation
People who looked at this resource also viewed the following:
Opus, Helsinki Korp Version
1 million subcorpus of National Corpus of Polish
ACCURAT corpus of Wikipedia texts
ACL RD-TEC: A Reference Dataset for Terminology Extraction and Classification Research in Computational Linguistics