ACCURAT balanced test corpus for under resourced languages

13 Last view: 2026-04-03

ACCURAT balanced test corpus for under resourced languages

Collection of paralell senteces for seven under resourced languages: Croatian, Estonian, Greek, Latvian, Lithuanian, Romanian and Slovenian, and English and German

You don’t have the permission to edit this resource.

Distribution

Availability: Available

Licences

CC - BY 4.0

Distribution Details

Distribution Access/Medium: Downloadable

Contact Person

Aivars Bērziņš

text

Multilingual text corpusLanguages

Modern Greek (1453 - ) (512 Sentences)

Language Script: Greek

Slovenian (512 Sentences)

Language Script: Latin

Romanian; Moldavian; Moldovan (512 Sentences)

Language Script: Latin

Latvian (512 Sentences)

Language Script: Latin

English (512 Sentences)

Language Script: Latin

Estonian (512 Sentences)

Language Script: Latin

Croatian (512 Sentences)

Language Script: Latin

German (512 Sentences)

Language Script: Latin

Lithuanian (512 Sentences)

Language Script: Latin

Linguality

Linguality type: Multilingual

Multi-linguality type: Parallel

Size

4,608 Sentences

Character encoding

UTF - 8

Modalities

Written Language

Creation

Creation mode: Manual

Metadata

Created: 07/09/2012

Last Updated: 05/11/2015

Version

Version: final

Last Updated: 31/12/2010

ValidationValidated

Usage

Foreseen UseNlp Applications

Use specific to NLP: Machine Translation

Actual Use - Nlp Applications

Use specific to NLP: Machine Translation

People who looked at this resource also viewed the following: