CW Corpus

Shardlow, M. (To appear). The CW Corpus: A new resource for evaluating the Identification of Complex Words. In Proceedings of the Second Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR 2013), Sofia, Bulgaria, Association for Computational Linguistics

The Complex Word (CW) Corpus contains 731 sentences each with one annotated CW. These simplifications were mined from Simple Wikipedia edit histories. Each entry gives an example of a sentence requiring simplification by means of a single lexical edit. This resource is primarily designed for the evaluation of CW identification systems.

You don’t have the permission to edit this resource.

  • Simple Wikipedia