MLRS Corpus

142,397 Maltese texts from 10 genres.

The file “corpus.zip” expands into a folder “corpus”, containing the file “tagged.zip”, which expands into the folder “cwb.final”. This folder contains the files:
• filelist.txt
• malti02.academic.txt
• malti02.law.txt
• malti02.literature.txt
• malti02.metadata.txt
• malti02.misc.txt
• malti02.parl.txt
• malti02.parl.txt.bak
• malti02.press.txt
• malti02.religion.txt
• malti02.speeches.txt
• malti02.web.genral.txt
• malti02.web.wiki.txt
• README.txt
• removed-from-corpus.txt
• tend.txt
• tstart.txt

All texts of a genre are in one .txt file for that genre. In this file, texts are marked with the XML tags <t>…</t>, paragraphs are marked <p>…</p>, sentences are marked <s>…</s>, and one word per line, followed by a tab and its POS tag.

You don’t have the permission to edit this resource.