PELCRA Spoken Learner English Corpus

89 Last view: 2026-05-21

PELCRA Spoken Learner English Corpus

PELCRA-PLEC-SP

http://pelcra.pl/res/spoken/plec

ID:

516 A subset of the PELCRA PLEC corpus, containing 15 hours (131 000 transcribed words) of recordings of informal interviews with Polish learners of English, time-aligned on the utterance and annotated manually for mispronounciation errors, provided as TEI P5-conformant XML and EAF (ELAN) files.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 07/12/2012

Licence

CC - BY - NC

Restrictions: Attribution, Other

Download location: hidden

Distribution Access/Medium: Downloadable

Attribution Details: Pęzik, Piotr. 2012. “Towards the PELCRA Learner English Corpus.” In "Corpus Data Across Languages and Disciplines", ed. Piotr Pęzik, 28:33–42. Łódź Studies in Language. Peter Lang.

Licensors:

Piotr Pęzik

Distribution rights holders:

University of Łódź

IPR Holder

University of Łódź

Contact Person

Piotr Pęzik

text
audio

Monolingual text corpusLanguages

English

Language Script: Latn

Linguality

Linguality type: Monolingual

Size

15 Hours

131,542 Words

Character encoding

UTF - 8 (131,542 Words)

Domains

general (131,542 Words)

Modalities

Recordings of interviews with Polish learners of English.

Spoken Language

AnnotationSpeech Annotation - Orthographic Transcription

Annotated elements: Mispronunciations

StandOff: True

Segmentation level: Utterance, Word

Format: text/xml

Standard practices conformance: TEI_P5

Annotation Mode: Manual (All personal information has been anonymised.)

Start date: 01/07/2011

End date: 01/11/2013

Size: 131,542 Words

Speech Annotation - Phonetic Transcription

Annotated elements: Mispronunciations

StandOff: True

Segmentation level: Word

Format: text/xml

Standard practices conformance: TEI_P5

Annotation Mode: Manual (All personal information has been anonymised.)

Start date: 01/07/2011

End date: 01/11/2013

Size: 131,542 Words

Speech Annotation - Sound To Text Alignment

Annotated elements: Mispronunciations

StandOff: True

Segmentation level: Utterance, Word

Format: text/xml

Standard practices conformance: TEI_P5, Other

Annotation Mode: Manual (All personal information has been anonymised.)

Start date: 01/07/2011

End date: 01/11/2013

Size: 131,542 Words

Geographic coverage

Poland (131,542 Words)

Creation

Creation mode details: Recordings of interviews with Polish learners of English.

Creation mode: Manual

Creation Tools

ELAN (http://tla.mpi.nl/to...)

Monolingual audio corpusLanguages

English (131,542 Words)

Linguality

Linguality type: Monolingual

Size

8 Gb

131,542 Words

Audio duration

15 Hours

Domains

general (131,542 Words)

Modalities

Recordings of interviews with Polish learners of English. (131,542 Words)

Spoken Language (131,542 Words)

Classification

(131,542 Words)

Register: informal

Audio genre: Speech

Speech genre: Conversation

Content

Speech items: Free Speech

Non-speech items: Noise

Noise Level: Medium

Setting

Naturality: Assisted

Conversational type: Multilogue

Interactivity: Overlapping

Audio Formatsaudio/wav (131,542 Words)

Compression: False

Recording quality: Medium

Quantization: 16

Number of tracks: 1

Sampling rate: 44100

Signal encoding: LinearPCM

Geographic coverage

Poland (131,542 Words)

RecordingRecorders

University of Łódź

Recording environment: Other

Recording device type: Hard Disk

Capture

Capturing device type details: Conversations were captured using an audio console set with external microphones or a voice recorder.

Capturing device type: Microphone

Capturing environment: Complex

Person SourceSet

Origin of persons: Native

Sex of persons: Mixed

Number of persons: 119

Age range end: 45

Age range start: 8

Geographic distribution of persons: Łódź region.

Creation

Creation mode details: Recordings of interviews with Polish learners of English.

Creation mode: Manual

Creation Tools

ELAN (http://tla.mpi.nl/to...)

Resource Creation

Resource Creator

Piotr Pęzik

University of Łódź

Łukasz Dróżdż

Funding Project

Central and South-East European Resources (CESAR - 271022)

URL: http://www.meta-net....

Funding Type: Eu Funds

Funder: DG INFSO of the European Commission

Funding Country: European Union

Project duration: 01/02/2011 - 31/01/2013

Central and South-East European Resources (CESAR - MNiSW 2139/CIP2007-2011/2)

URL: http://en.kpk.gov.pl...

Funding Type: National Funds

Funder: Ministry of Science and Higher Education

Funding Country: Poland

Project duration: 01/02/2011 - 31/01/2013

Polish Ministry of Science and Higher Education grant (PLEC - N N104 205039)

URL: http://pelcra.pl/plec/

Funding Type: National Funds

Funder: Ministry of Science and Higher Education

Funding Country: Poland

Project duration: 01/12/2010 - 01/11/2013

Metadata

Created: 30/06/2012

Metadata Language: English (en)

Metadata Creator

Maciej Buczek

Piotr Pęzik

Version

Version: 1.0

Revision: compilation of the corpus

Last Updated: 07/12/2012

ValidationValidated (131,542 Words)

Type of Validation: Formal

Validation Mode: Automatic

Extent: Full

Validation Tools:

xmllint

Validator

Łukasz Dróżdż

Piotr Pęzik

Documentation

http://pelcra.pl/res...

People who looked at this resource also viewed the following:

Resources from the same project

Resources from the same creators