This resource contains a table of machine translated segments that show errors. The purpose of this resource is to provide a set of segments that show typical MT issues/errors so that MT developers can compare the performance of their systems to see how well they perform with the same input.
It consists of two parts:
1. English→German Test Suite. 2. German→English Test Suite.
Each test suite contains two sorts of segments:
1. Segments from QTLaunchPad corpora or other resources. These segments were selected from annotated QTLaunchPad corpora to illustrate particular issue types. In addition, segments from various other sources were selected to demonstrate common problems.
2. Segments from the TSNLP Grammar Test Suite for English. To prepare this corpus all of the “grammatical” segments from TSNLP for the appropriate source language were reviewed. As TSNLP was not designed for use in MT testing, but rather to provide challenging cases for grammar checkers, a team of two native-speaker linguists evaluated all segments for each language and only those segments that both reviewers agreed were truly grammatical were used. In addition, sentence fragments were removed since isolated sentence fragments pose particular problems even for human translators. The resulting set of sentences was then translated using four leading commercial MT systems (two SMT and two RbMT) and sentences that proved problematic for both systems of a given MT type were classified as exhibiting a barrier for that system type.
For the TSNLP data, one MT result was selected from among those systems that were considered to exhibit barriers. This segment was the one judged by the group of linguists to come the closest to “getting it right”. For the corpus data, the translation in the corpus was used. In both cases the translation was annotated using MQM to identify issues and post-edited to show one possible way to resolve the issues. (Note that the post-editing was intended to be minimal, with only enough changes to make the sentence grammatical and acceptable. Full post-editing in many cases would result in more substantive changes in sentence structure, but the goal was not to create a stylistically perfect text.)
For the corpus data, no information is provided as to which system type translated the segment, for which system type(s) the segments proved to be a barrier, or the TSNLP class.