ELFA Corpus

View resource name in all available languages






The corpus is available in Kielipankki - the Language Bank of Finland (lat.csc.fi, see there among Restricted use corpora; download at http://urn.fi/urn:nbn:fi:lb-2014052721). Access rights instructions: https://kitwiki.csc.fi/twiki/bin/view/FinCLARIN/KielipankkiAccessRights. Apply here: https://lbr.csc.fi/web/guest/catalogue?domain=LBR&resource=urn:nbn:fi:lb-201403262&target=application

Altogether, the ELFA (English as a Lingua Franca in Academic Settings) corpus contains 1 million words of transcribed spoken academic ELF (approximately 131 hours of recorded speech). The data consists of both recordings and their transcripts, which will be available to researchers on request. The recordings were made at the University of Tampere, the University of Helsinki, Tampere University of Technology, and Helsinki University of Technology.

The speech events in the corpus include both monologic events, such as lectures and presentations (33 % of data), and dialogic/polylogic events, such as seminars, thesis defences, and conference discussions, which have been given an emphasis in the data (67%).

As for the disciplinary domains , the ELFA corpus is composed of social sciences (29% of the recorded data), technology (19%), humanities (17%), natural sciences (13%), medicine (10%), behavioural sciences (7%), and economics and administration (5%).

Also the speakers in ELFA represent a wide range of first language backgrounds as the data comprises approximately 650 speakers with 51 different first languages ranging from African languages (e.g. Akan, Dagbani, Igbo, Kikuyu, Somali, Swahili), to Asian (e.g. Arabic, Bengali, Chinese, Hindi, Japanese, Persian, Turkish, Uzbek), and European languages (e.g. Czech, Danish, Dutch, French, German, Italian, Lithuanian, Polish, Portuguese, Russian, Romanian, Swedish etc.).The percentage of speech by native English speakers is 5%. Also, considering that the recordings were made in Finnish speaking universities, the percentage of speech by Finnish mother tongue speakers is relatively low at 28.5%.

The ELFA corpus is available at http://lat.csc.fi/.

The purpose of the resource use must be outlined in a research plan.

Important: due to the nature of the material, the resource should be handled with care in order to respect the privacy of the personal data. If samples of the data are published, they must be anonymized according to best practices.

For detailed information on the license of the resource see http://urn.fi/urn:nbn:fi:lb-20150304132

Detailed information on the corpus: http://www.helsinki.fi/englanti/elfa/elfacorpus

Download: https://korp.csc.fi/download/ELFA/

You don’t have the permission to edit this resource.