Monolingual text corpus Languages
Polish
(73 Texts)
Language Script: Latn
Linguality Linguality type: Monolingual
Text Format Size
73 Texts
43 Hours
386,744 Words
Character encoding
UTF - 8
(73 Texts)
Domains Modalities
Transcriptions of spontaneous conversations of speakers representing a diverse age (1-90 years) and geographic group.
(73 Texts)
Spoken Language
(73 Texts)
Annotation Speech Annotation - Sound To Text Alignment StandOff: False
Segmentation level: Utterance
Format: text/xml
Standard practices conformance: TEI_P5
Annotation Mode: Manual (All personal information have been anonymised.)
Start date: 04/05/2011
End date: 12/05/2012
Size:
73 Texts
Speech Annotation - Speaker Turns StandOff: False
Segmentation level: Utterance
Format: text/xml
Standard practices conformance: TEI_P5
Annotation Mode: Manual
Start date: 04/05/2011
End date: 12/05/2012
Size:
73 Texts
Speech Annotation - Orthographic Transcription Annotated elements: Background Noise, Speaker Noise
StandOff: False
Segmentation level: Utterance
Format: text/xml
Standard practices conformance: TEI_P5
Annotation Mode: Manual
Start date: 04/05/2011
End date: 12/05/2012
Size:
73 Texts
Time Coverage
2008-2010
(73 Texts)
Geographic coverage
Poland
(73 Texts)
Creation Creation mode details: Recordings of spontaneous conversations manually transcribed orthographically and time-aligned on the utterance level.
Creation mode: Manual
Creation Tools Monolingual audio corpus Languages
Polish
(73 Files)
Linguality Linguality type: Monolingual
Size Audio duration
43 Hours
Domains Modalities
Transcriptions of spontaneous conversations of speakers representing a diverse age (1-90 years) and geographic group.
(73 Files)
Spoken Language
(73 Files)
Classification
(73 Files)
Register: informal
Audio genre: Speech
Speech genre: Conversation
Content Speech items: Free Speech
Non-speech items: Music, Noise, Sounds
Noise Level: Low
Setting Naturality: Spontaneous
Conversational type: Multilogue
Interactivity: Overlapping
Audio Formats audio/wav
(73 Files)
Compression: False
Recording quality: Medium
Quantization: 16
Number of tracks: 2
Sampling rate: 44100
Signal encoding: LinearPCM
Time Coverage
2008-2010
(73 Texts)
Geographic coverage
Poland
(73 Texts)
Recording Recorders
Recording environment: Other
Recording device type: Flash
Source channel: Airflow
Capture Capturing device type details: The conversations were captured using a digital voice recorder.
Capturing device type: Microphone
Capturing environment: Complex
Capturing details: Whenever possible, an attempt was made to take the recordings without the speakers being aware of the fact of being recorded. All participants were asked for permission to use the recordings afterwards.
Person SourceSet Age range end: 2
Age range start: 1
Geographic distribution of persons: Various regions across Poland.
Origin of persons: Native
Sex of persons: Mixed
Creation Creation mode details: Recordings of spontaneous conversations manually transcribed orthographically and time-aligned on the utterance level.
Creation mode: Manual
Creation Tools