The human evaluation (HE) dataset created for Dutch to German (NlDe) and Romanian to Italian (RoIt) MT tasks was a subset of the official test set of the IWSLT 2017 evaluation campaign.
The resulting HE sets are composed of 603 segments for both NlDe and RoIt, each corresponding to around 10,000 words. Human evaluation was based on Post-Editing, i.e. the manual correction of the MT system output, which was carried out by professional translators.
Nine primary runs submitted to the evaluation campaign with engines trained on constrained data conditions and in bilingual/multilingual/zero-shot mode, were post-edited for each of the two tasks.
Data will be publicly available through the WIT3 website wit3.fbk.eu. 603 segments for both NlDe and RoIt (10K tokens each). For each direction, 9 different automatic translations post-edited by professional translators.
Usage: for Analysis of MT quality and Quality Estimation components.