These are the test sets for the WMT shared translation task. They are small parallel data sets used for testing MT systems, and are typically created by translating a selection of crawled articles from online news sites.
The 2018 test sets will be available end of June 2018.
Cracker has contributed to the German-English and Czech-English test sets from 2015 to 2018 as well as a different guest language in each of these years. The guest language pair for 2018 is English-Finnish.
We also included Russian, Turkish, Chinese, Estonian and Kazakh with funding from other sources.
The source data are crawled from online news sites and carry the respective licensing conditions.