A "scientific" corpus of modern French ("La Recherche" magazine) - Complete version

87 Last view: 2026-03-19

A "scientific" corpus of modern French ("La Recherche" magazine) - Complete version

View resource name in all available languages

Corpus Scientifique du français contemporain (magazine "La Recherche") - Version complète

http://catalog.elra.info/product_info.php?products_id=595

ID:

ELRA-W0025_02

This "scientific" corpus of modern French was produced by the University of Nantes (France) within the European Commission funded project LRsP&P (Language Resources Production & Packaging - LE4-8335).
The corpus contains all articles published in La Recherche magazine in 1998, including issues 305 (January) to 315 (December), which amounts to 447,244 tokens and 30,238 types. It is aimed to be used within text analysis and related applications.
The texts, provided in XML (Extended Markup Language) format, have been marked-up into the SGML standard (Standard Generalized Markup Language). XML contained a structure where only the constituant parts of the text were coded (title, body, etc.), whereas SGML marking up , richer, goes up to the word level, including the grammatical category and the canonical form for each word. The annotation work is conformant with the TEI (Text Encoding Initiative) international project's guidelines.

View resource description in all available languages

Le Corpus Scientifique du français contemporain a été réalisé par l'Université de Nantes (France) dans le cadre du projet européen LRsP&P (Language Resources Production & Packaging - Production et mise au format des ressources linguistiques - LE4-8335) soutenu par la Commission Européenne.

Le corpus est constitué de tous les articles de la revue La Recherche parus en 1998, dans les numéros 305 (Janvier) à 315 (Décembre), pour un total de 447 244 "tokens" et 30 238 types. Il est destiné à être exploité dans des systèmes d'analyse de textes et des applications connexes.

Les textes des articles, fournis au format XML (Extended Markup Language), ont été mis au standard SGML (Standard Generalized Markup Language). Le format XML consistait essentiellement en une structuration du texte selon ses parties constitutives (titre, corps etc.) alors que la structure SGML actuelle du corpus est beaucoup plus riche puis qu'elle descend jusqu'au niveau du mot en apportant, pour chacun d'eux, sa catégorie grammaticale et sa forme canonique. Le travail d'annotation suit les directives du projet international TEI (Text Encoding Initiative).

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Start date: 15/12/2000

Licence

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

User Nature: Commercial

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Members of ELRA

User Nature: Academic

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

User Nature: Commercial

ELRA VAR

Restrictions: Commercial Use

For Non Members of ELRA

User Nature: Academic

ELRA END USER

Restrictions: Academic - Non Commercial Use

For Non Members of ELRA

User Nature: Academic

Contact Person

Mapelli Valérie

text

Monolingual text corpusLanguages

French

Linguality

Linguality type: Monolingual

Size

no size available

Resource Creation

Funding Project

LRsP&P (Language Resources Production & Packaging - LE4-8335)

Funding Type: Eu Funds

Metadata

Created: 12/05/2005

Version

Version: 1.0

Last Updated: 22/02/2007

People who looked at this resource also viewed the following:

Resources from the same project