Identifying and expanding titles in web texts

15 years 9 months ago

Download www.irit.fr

In this paper, we present an analysis based on linguistic and typographic features that allows for the identification of titles in web documents. We focus in particular on procedural texts. Identifying titles is a difficult task because ways of encoding them are very diverse. A number of titles are also incomplete because of context, we propose therefore a way to retrieve the missing elements, in particular predicates, so that titles are fully intelligible. Categories and Subject Descriptors H.4 [Information Systems Applications]: Miscellaneous General Terms Human factors, Experimentation Keywords structure analysis, text semantics, text titles

Clémentine Adam, Estelle Delpech, Patrick S

Real-time Traffic

DOCENG 2008 | Document Analysis | Keywords Structure Analysis | Miscellaneous General Terms | Procedural Texts |

claim paper

» Lexicon Development and POS Tagging Using a Tagged Bengali News Corpus

» Constructing a text corpus for inexact duplicate detection

» Web scale NLP a case study on url word breaking

» A Comparison of Stylometric and Lexical Features for Web Genre Classification and Emotion ...

Post Info
More Details (n/a)

Added	19 Oct 2010
Updated	19 Oct 2010
Type	Conference
Year	2008
Where	DOCENG
Authors	Clémentine Adam, Estelle Delpech, Patrick Saint-Dizier

Comments (0)

Sciweavers

Identifying and expanding titles in web texts

DOCENG 2008 | Document Analysis | Keywords Structure Analysis | Miscellaneous General Terms | Procedural Texts |

Explore & Download

Productivity Tools

Sciweavers