Hantin korpus (pohjoishantin aineistot ja käännökset) (UHLCS)
The Khanty computer corpus contains the following sub-corpora:
Khanty, Atlym dialect, 519 words, 3967 characters
Khanty, Kazym dialect, 62766 words, 585659 characters
Khanty, Konda dialect, 1115 words, 10234 characters
Khanty, Nizjam dialect, 17681 words, 259732 characters
Khanty, Obdorsk dialect, 10939 words, 200358 characters
Khanty, Synja dialect, 10939 words, 200358 characters.
The corpora of the Khanty dialects are samples taken from the following text collections:
The corpora are running texts and several corpora are morphologically analyzed. Morphologically encoded words of the texts are in the word-per-line format, and the plain texts are in sentence-per-line format. There are also texts in which the clauses and the sentences are marked with the information about the location of the sentences in the texts.
Khanty, Textbook:
Rugin, R.P. (1990).
Shum jôxan sjun'öng xâtLöt.
(Shchastlivye den'ki na Shum-jugane.) [Onnellisia päiviä Shum-joella.]
Kniga dlja dopol'nitel'nogo chtenija v 3-4 klassax xantyjskix shkol (shuryshkarskij dialekt).
Prosveshchenie, Leningrad.
The text includes six different versions: (1) one version edited in the original form by using the Cyrillic alphabet; (2) the same text as transformed to the Latin alphabet; the same text as translated into (3) Finnish, (4) English and (5) Russian, and (6) the original text in the Latin format as morphologically coded and translated into English.
Children's books:
Life of Jesus in Khanty (the Kazim dialect). (Trial edition).
Translation: Nyomysova, Yevdokiya Andreyevna &
Lozyamova, Zoya Nikiforovna.
ISBN 952-9790-25-2, ISBN 91-88394-97-2. 63 pp.
Institute for Bible Translation.
Stockholm & Helsinki 1995.
Life of Jesus in Khanty (the Kazim dialect). (Second edition).
Translation: Nyomysova, Yevdokiya Andreyevna &
Lozyamova, Zoya Nikiforovna.
ISBN 952-9790-40-6, ISBN 91-88794-83-0. 63 pp.
Institute for Bible Translation.
Stockholm & Helsinki 1997.
The computer corpora on the Khanty dialects, and the textbook were compiled and edited by Merja Salo with the financial support of the Academy of Finland. The adaptation of the texts for public use was done with the financial support of the Department of General Linguistics, University of Helsinki. The books of children were donated to the University of Helsinki by the Institute for Bible Translation, Helsinki and Stockholm.
The Khanty Corpus is a part of the UHLCS corpus collection.
UHLCS has many different IPR holders. Should you have any questions regarding the collection, please contact Pirkko Suihkonen (
