WMT 2018 Quality Estimation Core Data Set

For WMT2018 there are 6 sets in total (the data sets will be made available end of June 2018):
1) English-German SMT: 30k segments split in 27K for training, 1K for development and 1K for test.
2) English-German NMT: 30,000 segments split in 27K for training, 1,000 for development and 2,000 for test (same source segments as for SMT).
3) English-Latvian SMT: 20,738 segments split in 17,738 for training, 1,000 for development and 2,000 for test.
4) English-Latvian NMT: 20,738 segments split in 17,738 for training, 1,000 for development and 2,000 for test (same source as for SMT).
5) English-Czech SMT: 45,000 segments split in 42,000 for training, 1,000 for development and 2,000 for test.
6) German-English SMT: 45,000 segments split in 42,000 for training, 1,000 for development and 2,000 for test.
Training, development and test data for English-German and English-Czech consist of triplets (source, target and post-edit) belonging to the Information Technology domain and already tokenized. Target sentences are machine-translated with the SMT and NMT KIT systems (German) and a CUNI system (Czech). Post-edits are collected by Text and Form from professional translators (German) and subcontracted for Czech.
Training, development and test data for German-English and English-Latvian consist of triplets (source, target and post-edit) belonging to the Pharma domain and already tokenized. Target sentences are machine-translated with the KIT SMT system (German) and TILDE NMT and SMT systems (Latvian). Post-edits are collected from professional translators by Text and Form (German) and by TILDE (Latvian).

IMPORTANT LEGAL NOTICE
TAUS Terms of Use (https://lindat.mff.cuni.cz/repository/xmlui/page/licence-TAUS_QT21).
TAUS grants to QT21 User access to the WMT Data Set with the following rights:
i) the right to use the target side of the translation units into a commercial product, provided that QT21 User may not resell the WMT Data Set as if it is its own new translation;
ii) the right to make Derivative Works; and
iii) the right to use or resell such Derivative Works commercially and for the following goals:
i) research and benchmarking;
ii) piloting new solutions; and
iii) testing of new commercial services.

You don’t have the permission to edit this resource.