Corpus of Spoken Bulgarian

65 Last view: 2026-06-25

Corpus of Spoken Bulgarian

SpokenBg

http://bgspeech.net/

ID:

820 The Corpus of Spoken Bulgarian (SpokenBg) is a selection of data of spoken Bulgarian language incl. data from interviews, media and formal speech, student speech, academic speech, colloquial speech. The total size of the corpus is 523,128 signs as of the end of 2012.
Part of it contains edited versions of transcripts of conversational speech being converted from a semi-phonetic transcription to standard orthography with original semi-phonetic transcripts presented together with the edited versions in a paragraph-aligned display.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Unrestricted Use

Start date: 21/01/2013

Licence

CC - BY - NC - SA

Distribution Access/Medium: Accessible Through Interface

IPR Holder

Sofia University “St. Kliment Ohridski”

Contact Person

Yovka Tisheva

text

Monolingual text corpusLanguages

Bulgarian

Linguality

Linguality type: Monolingual

Size

10 Hours

Monolingual text corpusLanguages

Bulgarian

Linguality

Linguality type: Monolingual

Size

523,128 Phonetic Units

Character encoding

UTF - 8

Modalities

Spoken Language

AnnotationSpeech Annotation - Speaker Turns

Segmentation level: Paragraph, Sentence, Word

Resource Creation

Creation started: 01/01/2010

Funding Project

Models and tools for spoken communication of contemporary Bulgarian language

Funding Type: National Funds

Funding Country: Bulgaria

Central and South-East European Resources (CESAR)

URL: http://cesar.nytud.hu/

Funding Type: Eu Funds

Project duration: 01/02/2011 - 30/01/2013

Metadata

Created: 27/01/2013

Last Updated: 01/02/2013

Version

Version: 4.0

ValidationValidated

Documentation

Атанасов, Атанас. Проблеми при създаването на езикови корпуси с транскрибирана българска разговорна реч. - В: Паисиеви четения. Научни трудове. Том 44, кн. 1, сб. А, 2006. УИ “Паисий Хилендарски”, Пловдив, 2006, с. 289-296.

Тишева, Йовка, Марина Джонова. Електронни ресурси за българската разговорна реч (инициативата BgSpeech). - Littera et Lingua, лято 2010.

Atanasov, Atanas. Encoding Bulgarian Colloquial Speech Using TEI Specification. - In: Computer Applications in Slavic Studies. “Boyan Penev” Publishing Center, Sofia, 2006, pp. 233-240.

People who looked at this resource also viewed the following:

Resources from the same project