SIEMENS 100 - SI100



The corpus contains read speech of 101 different speakers. Each speaker has read approximately 100 sentences from a German newspaper corpus from the SuedDeutch Zeitungen (SZ), consiting of two sub-corpus known as the SZ subcorpus (contains 544 sentences from newspaper articles) and the CeBit subcorpus (contains 483 sentences from newspaper articles about CeBit 1995). Each subcorpus is divided into 5 parts of approximately 100 utterances each. Every speaker read only part of one subcorpus (with some exceptions), thus resulting in a total of approximately 10100 recorded utterances (7 CDROMs).

