TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data

53 Last view: 2026-06-25

TRAD Pashto-French Parallel corpus of transcribed Broadcast News Speech - Training data

View resource name in all available languages

Corpus TRAD parallèle pachto-français (transcriptions d’actualités radio et télédiffusées) - Données d'entraînement

http://catalog.elra.info/product_info.php?products_id=1267

ID:

ELRA-W0093

The corpus consists of the transcription of 106 hours of recordings in Pashto translated into French. The transcriptions are extracted from the TRAD Pashto Broadcast News Speech Corpus (ELRA-S0381). It contains about 832,000 source words and 747,000 target words. No audio file is provided.

Pashto is an indo-iranian language spoken by the Pashtun people mainly in Pakistan and Afghanistan.

This corpus was produced by ELDA within the PEA TRAD project supported by the French Ministry of Defence (DGA). It was used as training data for language modelling in machine translation.

View resource description in all available languages

Le corpus contient la traduction en français de 106 heures de transcription du Corpus TRAD d’actualités radio et télédiffusées en pachto (ELRA-S0381). La taille du corpus est d’environ 832 000 mots source pour 747 000 mots cible. Aucun fichier audio n’est fourni.

Le pachto (ou pachtou) appartient à la famille des langues indo-iranienne. Il est parlé par les Pachtounes, principalement au Pakistan et en Afghanistan.

Ce corpus a été produit par ELDA dans le cadre du projet PEA TRAD, avec le soutien de la Direction Générale de l'Armement (DGA). Il a été utilisé en tant que données d’entraînements pour créer des modèles de langue dans le domaine de la traduction automatique.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 06/04/2016

Licence

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 18,000.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 10,000.00

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 10,000.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

Fee: 10,000.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

Fee: 3,000.00

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 18,000.00

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

Fee: 18,000.00

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

Fee: 4,000.00

User Nature: Academic

Contact Person

Mapelli Valérie

text

Monolingual text corpusLanguages

French Pushto

Linguality

Linguality type: Monolingual

Size

27.6 Mb

Metadata

Created: 12/05/2005

Version

Version: 1.0

Last Updated: 06/04/2016

People who looked at this resource also viewed the following: