The human evaluation (HE) dataset created for English to German (EnDe) and English to French (EnFr) MT tasks was a subset of one of the official test sets of the IWSLT 2016 evaluation campaign. The resulting HE sets are composed of 600 segments for both EnDe and EnFr, each corresponding to around 10,000 words. Human evaluation was based on Post-Editing, i.e. the manual correction of the MT system output, which was carried out by professional translators. Nine and five primary runs submitted to the evaluation campaign were post-edited for the two tasks, respectively.
Data are publicly available through the WIT3 website wit3.fbk.eu. 600 segments for both EnDe and EnFr (10K tokens each). Respectively, 9 and 5 different automatic translations post-edited by professional translators (for Analysis of MT quality and Quality Estimation components).