Hungarian National Corpus – META-SHARE

Last view: 2026-06-25

58 Last view: 2026-06-25

Hungarian National Corpus

HNC

http://hnc2.nytud.hu

123

ID:

HGC

The national corpus of Hungarian language which is derived into five subcorpora by regional language variants, and into five subcorpora by text genres also. The subcorpus to be studied can be chosen by any combination of these. That makes the HNC an appropriate tool to study the differences not just between text genres but between language variants. HGC wishes to be a representative general-aim corpus of present-day standard Hungarian.
HNC v2 is based on the Hungarian National Corpus with higher quality and ﬁner level of analysis and annotation (detailed morphosyntactic analysis and disambiguation with updated processing toolchain, NP chunking, Named Entity recognition, distributional analysis, built in post-processing (multilevel frequency lists, subsequent searches on previous results)). HNC2 is extended up to 1 gigaword treshold with extended metadata and cleared IPR.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Unrestricted Use

Licence

MS Commons - BY - NC

Restrictions: Academic - Non Commercial Use

Distribution Access/Medium: Accessible Through Interface

Contact Person

text

Monolingual text corpusLanguages

Hungarian (1 000 976 483 Tokens)

Linguality

Linguality type: Monolingual

Size

1 000 976 483 Tokens

Modalities

Other

AnnotationMorphosyntactic Annotation - B Pos Tagging

Metadata

Created: 25/01/2013

Last Updated: 09/04/2013

People who looked at this resource also viewed the following: