Summ-it

75 Last view: 2026-03-20

http://www.inf.pucrs.br/ontolp/downloads-ontolpplugin.php

The corpus was developed as a linguistic resource for Automatic Summarization research and his relation with different issues to engage studies on the discourse treatment.
Summ-it consists of fifty texts from Science domain extracted from Science section of Brazilian daily newspaper Folha de São Paulo (FSP), compose by:
I. Human summaries produced by experts in summarization (Coelho, 2007), rewriting the original texts in a compressed format.
II. Automatic summaries, obtained by GistSumm (Pardo et al., 2002, and Pardo et al., 2003) and SuPor-2 (Leite and Rino, 2006a, Leite and Rino, 2006c, and Leite and Rino, 2006b). All summaries were generated with a 70% compression rate, which means that the summaries correspond to roughly 30% of the original texts.
III. Manual underline sentences which contain relevant informations from the original texts (see 3.2).
IV. Texts semi-automatically annotated with morpho-syntactic informations, assisted by the syntactic parser PALAVRAS (available at: http://visl.sdu.dk/visl/pt/) and Xtractor converter (available at: http://abc.di.uevora.pt/xtractor/).
V. Texts semi-automatically annotated with co-reference informations (MMAX) and with rhetorical relations (RST) (cf. Carbonel et al., 2007, Fuchs, 2008, and Collovini et al., 2007) of noun phrases. The first process intents the identification of the entities in the discourse (e.g. noun phrases) referred or recovered in the text and, the second one, permits to structure a text by relating their discursive units through RST relations.

You don’t have the permission to edit this resource.

DistributionAvailability

Available - Restricted Use

Licence

CC - BY - NC - SA

Distribution Access/Medium: Downloadable

IPR Holder

Renata Vieira

Contact Person

Renata Vieira

text

Monolingual text corpusLanguages

Portuguese (50 Texts)

Variety: Brazilian Portuguese (Type: Other) (50 Texts)

Linguality

Linguality type: Monolingual

Text Format

text/xml (50 Texts)

Size

50 Texts

Character encoding

UTF - 8 (50 Texts)

Domains

Science (50 Texts)

Modalities

Written Language

Geographic coverage

Brazil (50 Texts)

Creation

Creation mode details: Human summaries produced by experts in summarization and automatic summaries, obtained by GistSumm and Supor-2.

Creation mode: Mixed

Creation Tools

PALAVRAS
MMAX
Xtractor converter

Metadata

Created: 10/07/2012

Last Updated: 28/11/2012

Source: METANET4U

METASHARE

Metadata Language: English

Metadata Creator

Catarina Carvalheiro

Version

Version: 1

Last Updated: 10/07/2012

Usage

Foreseen UseNlp Applications

Use NLP Specific: Discourse Analysis, Summarisation

Actual Use - Nlp Applications

Use NLP Specific: Discourse Analysis, Summarisation

Documentation

Tool Documentation: Online

Samples Location: http://194.117.45.19...

Document Type: Article

Collovini, S., Carbonel, T., Fuchs, J. t., Coelho, J. C., Rino, L., and Vieira, R., "Summ-it: Um corpus anotado com informações discursivas visando à sumarização auomática", http://www.inf.pucrs... , 5.º Workshop em Tecnologias da Informação e da Linguagem Humana , 2007

Editor: TIL'2007

Publisher: TIL'2007

Document Type: Other

Catarina Carvalheiro, Summ-it Narrative Description, http://194.117.45.19...

People who looked at this resource also viewed the following: