Hungarian Language Processing Tools in NooJ

NooJ

ID:

118

The Hungarian NooJ contains a morphological dictionary (based on the more than 60 000 lemmata found in the Concise Dictionary of Hungarian Language morphological information based on the work of Laszlo Elekfi). From the base forms amd the morphological information contained in the .DIC files using the inflectional rules described in the .FLX files complex inflected forms of nouns and verbs are generated with the help of Nooj compile dictionary function. The result of the compilation can be found in the .NOD files. With the aid of the NOD files complex inflected forms can be recognised in the running texts, including derived and further inflected running words, as well as non inflected forms, naturally. Separate dictionaries contain words which cannot be inflected. As the result of this, complex suffixed words and/or compounds can also be recognised when analysing a text. With the aid of the compiled dictionaries and the language specific syntactic graphs the tool performs sentence- and clause-segmentation, POS-tagging NP-recognition, predicate-identification and the identification of the other sentence constituents (eg. adverbials). The input text may be any Hungarian raw text or any xml-text compatible with NooJ, and the output may also be exported in xml-format. NooJ is widely used in Hungarian linguistics and language technology: its usage covers a broad scale of morphological, syntactic, lexical, semantic and psychological content analyses. The Hungarian NooJ toools are consisiting of a range of scpecific dictionaries (basic .dic files for disctionaries, .nog files for compilled didctionries and .flx files for morphological rules). Each of them is created for scpecific analyses. Below is a short description for each of them: noun.dic Hungarian nouns supplied with morhpological information -- 55000 units, verb_00.dic Hungarian verbs supplied with morhpological information -- 10000 units, topabbr.dic Most frequent Hungarian abbreviations -- 11 tokens, noaffix-nins.dic Hungarian words which cannot be inflected -- 1870 units, topprop.dic Most frequent proper names -- 28 units, noun.nod Compiled Nooj dictionary of Hungarian nouns -- 96777513 words, verb_00.nod Compiled Nooj dictionary of Hungarian verbs -- 19059644, topabbr.nod Most frequent Hungarian abbreviations -- 11 words, noaffix.nod Compiled Nooj dictionary of Hungarian words which cannot be inflected. -- 1870 words, topprop.nod Compiled Nooj dictionary of the most frequent Hungarian proper names -- 28 words, noun.flx Inflectional rules of Hungarian nouns according to their morphological category -- 33 000 rules, verb.flx Inflectional rules of Hungarian verbs according to their morphological category -- 27900 rules

You don’t have the permission to edit this resource.