WMT18 Quality Estimation Task: Product Reviews – META-SHARE

Last view: 2026-06-02

84 Last view: 2026-06-02

Last update: 2018-03-07

1 Last update: 2018-03-07

WMT18 Quality Estimation Task: Product Reviews

This data (available end of June 2018) consists of a selection of product titles and descriptions from the Amazon Product Reviews dataset (http://jmcauley.ucsd.edu/data/amazon/qa/) which focuses on the Sports and Outdoors category. The data was machine-translated by a state-of-the-art off-the-shelf MT system (Bing) and annotated for errors at the word level as follows:
The errors are annotated following the MQM fine-grained typology, which is composed of three major branches: accuracy (the translation does not accurately reflect the source text), fluency (the translation affects the reading of the text) and style (the translation has stylistic problems, like the use of a wrong register). These branches include more specific issues lower in the hierarchy. Besides the identification of an error and its classification according to this typology (by applying a specific tag), the errors will receive a severity scale that will show the impact of each error on the overall meaning, style, and fluency of the translation. An error can be minor (if it doesn’t lead to a loss of meaning and it doesn’t confuse or mislead the user), major (if it changes the meaning) or critical (if it changes the meaning and carry any type of implication, or could be seen as offensive).
In essence, the annotation process involves the following steps:
- select the error (a unit that comprises all elements that constitute the error): unitising step;
- apply a specific tag (from the error typology): tagging step;
- choose a severity degree: rating step.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Unrestricted Use

Licence

CC - ZERO

Contact Person

Christian Dugast

text

Bilingual text corpusLanguages

English French

Linguality

Linguality type: Bilingual

Multi-linguality type: Parallel

Size

100,000 Words

Metadata

Created: 07/03/2018

Last Updated: 07/03/2018

Usage

Foreseen UseNlp Applications

Use NLP Specific: Machine Translation

Actual Use - Nlp Applications

Use NLP Specific: Machine Translation

People who looked at this resource also viewed the following: