The Hungarian Reference Speech Database (MRBA) was developed at the Laboratory of Speech Acoustics of of the Budapest University of Technology and Economics (BME) in collaboration with the Institute of Informatics of the University of Szeged . The main goal was to develop a speech database that contains continuous read speech, so that the database can be used for training and testing of PC-based automatic speech recognisers. During the planning of the corpus, we took into consideration the special characteristics of Hungarian language.
Since the Hungarian is an agglutinative language, we needed to create a larger vocabulary in some categories, than it is mandatory. We tried to pay an extra attention to the topic 'phonetically rich sentences and words', to create a phonetically well balanced speech database for text independent speech recognizers. A detailed statistical analysis was prepared to examine the statistics of phonemes, diphones, triphones and syllables.
In this way every speaker had to read 12 different sentences and 12 different words, that had no connection with the sentences. The database contains utterances read by 332 different speakers.
The utterances were recorded in acoustically different locationas, such as office, laboratories, home. The database contains utterances recorded simultaneously with two different systems. One of these systems was considered the reference system. This reference system contained a laptop, an external sound card and a good quality condeser microphone. The reference system was unchanged until the database was finished. In case of the other system, we changed the microphones, sound cards, PC-s.
To cover the dialects spoken in Hungary, we made records in four different locations of the country and we took into consideration the gender and age of speakers, so the database has balanced distribution over gender, age and dialects.
Every spoken utterance has been labeled, so every wave (16kHz, 16bit, mono) file has a label file, which contains informations about the parameters of the record and the ortographical transcription of the spoken material. Almost one third of the database (100 speakers' utterances) was manually segmentated and labelled at phoneme level, using SAMPA codes.
People who looked at this resource also viewed the following: