A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression

15 years 9 months ago

Download www.lrec-conf.org

This paper presents two corpora produced within the RPM2 project: a multi-document summarization corpus and a sentence compression corpus. Both corpora are in French. The first one is the only one we know in this language. It contains 20 topics with 20 documents each. A first set of 10 documents per topic is summarized and then the second set is used to produce an update ation (new information). 4 annotators were involved and produced a total of 160 abstracts. The second corpus contains all the sentences of the first one. 4 annotators were asked to compress the 8432 sentences. This is the biggest corpus of compressed sentences we know, whatever the language. The paper provides some figures in order to compare the different annotators: compression rates, number of tokens per sentence, percentage of tokens kept according to their POS, position of dropped tokens in the sentence compression phase, etc. These figures show important differences from an annotator to the other. Another point ...

Claude de Loupy, Marie Guégan, Christelle A

Real-time Traffic

Education | LREC 2010 | Multi-document Summarization Corpus | Sentence Compression Corpus | Sentence Compression Phase |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2010
Where	LREC
Authors	Claude de Loupy, Marie Guégan, Christelle Ayache, Somara Seng, Juan-Manuel Torres Moreno

Comments (0)

Sciweavers

A French Human Reference Corpus for Multi-Document Summarization and Sentence Compression

Education | LREC 2010 | Multi-document Summarization Corpus | Sentence Compression Corpus | Sentence Compression Phase |

Explore & Download

Productivity Tools

Sciweavers