1 million subcorpus of National Corpus of Polish




The National Corpus of Polish (PL: Narodowy Korpus Języka Polskiego, NKJP) is a shared initiative of four institutions: Institute of Computer Science at the Polish Academy of Sciences (coordinator), Institute of Polish Language at the Polish Academy of Sciences, Polish Scientific Publishers PWN, and the Department of Computational and Corpus Linguistics at the University of Łódź. It has been registered as a research-development project of the Ministry of Science and Higher Education. The list of sources for the corpus contains classic literature, daily newspapers, specialist periodicals and journals, transcripts of conversations, and a variety of short-lived and internet texts. The resources represent wide diversity with respect to the subject and genre. The spoken part covers both male and female speakers, in various age groups, coming from various regions in Poland. The 1-million subcorpus of NKJP has been manually annotated.

You don’t have the permission to edit this resource.
  • Morfeusz SGJP (automatic), Anotatornia (manual)
  • Morfeusz SGJP (automatic), Anotatornia (manual)
  • Anotatornia
  • Spejd (automatic), TrEd (manual)
  • Nerf (automatic), TrEd (manual)
  • Morfeusz SGJP (automatic), Anotatornia (manual)
  • Morfeusz SGJP (automatic), Anotatornia (manual)
  • WSDDE (automatic), Anotatornia (manual)
  • text collected by IJP PAN, PELCRA and PWN specifically for NKJP
  • the PWN corpus
  • the PELCRA corpus
  • the IPI PAN corpus
  • Anotatornia
  • various shell scripts
  • Nerf
  • Spejd
  • Morfeusz SGJP