QT21 WMT17 Human Post-Edited data set

58 Last view: 2026-07-03

4 Last update: 2018-03-02

QT21 WMT17 Human Post-Edited data set

http://www.qt21.eu/,

https://lindat.mff.cuni.cz/

ID:

http://hdl.handle.net/11372/LRT-2390

Training of Automatic Post-editing and Quality Estimation components / Quality Estimation / Error Analysis.
Set of 10,800 Human Post-Edited (HPE) quadruplets for three language pairs on WMT17 news task data. Each quadruplets consists of (source, reference, target, HPE). For each language pair, the target segments have been produced on the WMT17 news task by the three best WMT17 systems in their respective language pair. Each translation engine has provided 1,200 segments. Translations (targets) have been generated using, “1 62.0 0.308 uedin-nmt”,”3 55.9 0.111 limsi-factored-norm”, “54.1 0.050 CU-Chimera” for En-Cz, “69.8 0.139 uedin-nmt”,”66.7 0.022 KIT”, “66.0 0.003 RWTH-nmt-ensemb” for En-De and “54.4 0.196 tilde-nc-nmt-smt”, “50.8 0.075 limsi-fact-norm”,”50.0 0.058 usfd-cons-qt21” for En-Lv. HPEs for En-De have been collected by professional translators from Text&Form. En-Lv HPEs have been collected by professional translators from Tilde. En-Cz HPEs have been collected by professional translators from Traductera.

IMPORTANT LEGAL NOTICE (This dataset is provided under the following terms of use)
TAUS Terms of Use (https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21).
TAUS grants to QT21 User access to the WMT Data Set with the following rights:
i) the right to use the target side of the translation units into a commercial product, provided that QT21 User may not resell the WMT Data Set as if it is its own new translation;
ii) the right to make Derivative Works; and
iii) the right to use or resell such Derivative Works commercially and for the following goals:
i) research and benchmarking;
ii) piloting new solutions; and
iii) testing of new commercial services.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

Other

Distribution Access/Medium: Downloadable

Contact Person

Christian Dugast

text

Bilingual text corpusLanguages

English Latvian

Linguality

Linguality type: Bilingual

Multi-linguality type: Parallel

Size

10,800 Human Post Edited (HPE) triplets (for 3 language pairs)

Bilingual text corpusLanguages

English German

Linguality

Linguality type: Bilingual

Multi-linguality type: Parallel

Size

10,800 Human Post Edited (HPE) triplets (for 3 language pairs)

Bilingual text corpusLanguages

English Czech

Linguality

Linguality type: Bilingual

Multi-linguality type: Parallel

Size

10,800 Human Post Edited (HPE) triplets (for 3 language pairs)

Metadata

Created: 13/12/2017

Last Updated: 02/03/2018

Metadata Creator

Kanella Pouli

Usage

Foreseen UseNlp Applications

Use NLP Specific: Machine Translation

Actual Use - Nlp Applications

Use NLP Specific: Machine Translation

People who looked at this resource also viewed the following: