WMT 2017 Quality Estimation Datasets – phrase-level – META-SHARE

Last view: 2026-06-25

44 Last view: 2026-06-25

Last update: 2018-02-16

3 Last update: 2018-02-16

WMT 2017 Quality Estimation Datasets – phrase-level

Bilingual corpora labelled for quality at phrase-level (for researchers working on quality estimation or evaluation of machine translation).
7,500 machine translations annotated for quality with binary labels (good/bad) at the phrase-level (67,817 phrases). To be used to train and test quality estimation systems.
The corpus consists of source segments in English, their machine translation, a segmentation of these translations into phrases and a binary score given by humans indicating the quality of these phrases.

IMPORTANT LEGAL NOTICE
TAUS Terms of Use (https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21).
TAUS grants to QT21 User access to the WMT Data Set with the following rights:
i) the right to use the target side of the translation units into a commercial product, provided that QT21 User may not resell the WMT Data Set as if it is its own new translation;
ii) the right to make Derivative Works; and
iii) the right to use or resell such Derivative Works commercially and for the following goals:
i) research and benchmarking;
ii) piloting new solutions; and
iii) testing of new commercial services.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

Other

Distribution Access/Medium: Downloadable

Contact Person

text

Bilingual text corpusLanguages

German English

Linguality

Linguality type: Bilingual

Size

67,817(phrases)

Metadata

Created: 13/12/2017

Last Updated: 16/02/2018

Usage

Foreseen UseNlp Applications

Use NLP Specific: Machine Translation

Actual Use - Nlp Applications

Use NLP Specific: Machine Translation

People who looked at this resource also viewed the following: