GENIA POS & Term Corpus
ID:
f36f8ab51fe04d34a7a78d5c0ceab6e9
A corpus of 2,000 MEDLINE abstracts, collected using the three MeSH terms human, blood cells and transcription factors. The corpus is available in three formats: 1) A text file containing part-of-speech (POS) annotation, based on the Penn Treebank format, 2) An XML file containing inline POS annotation, 3) A “merged” XML format, containing inline annotations, corresponding to both POS and term annotations
People who looked at this resource also viewed the following: