UIMA/U-Compare GENIA Tokeniser (GENIA Tagger)

Tsuruoka, Y., Tateisi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S. and Tsujii, J.. (2005). Developing a Robust Part-of-Speech Tagger for Biomedical Text. Advances in Informatics - 10th Panhellenic Conference on Informatics, pp 382--392, Springer-Verlag
Tsuruoka, Y., Tateisi, Y., Kim, J.-D., Ohta, T., McNaught, J., Ananiadou, S. and Tsujii, J.. (2005). Developing a Robust Part-of-Speech Tagger for Biomedical Text. Advances in Informatics - 10th Panhellenic Conference on Informatics, pp 382--392, Springer-Verlag

Tokenisation is one of the functionalities of the GENIA tagger, which additionally outputs the base forms, part-of-speech tags, chunk tags, and named entity tags. The tagger is specifically tuned for biomedical text such as MEDLINE abstracts.
The tool is a UIMA component, which forms part of the in-built library of components provided with the U-Compare platform see separate META-SHARE record) for building and evaluating text mining workflows.

You don’t have the permission to edit this resource.
  • U-Compare Workbench
  • U-Compare Workbench