A new handwritten text database, GERMANA, is presented to facilitate empirical comparison of different approaches to text line extraction and off-line handwriting recognition. G...
This paper describes the participation of Columbus Project of Microsoft Research Asia (MSRA) in the GeoCLEF 2006 (a cross-language geographical retrieval track which is part of Cr...
Zhisheng Li, Chong Wang 0002, Xing Xie, Xufa Wang,...
We present Content Extraction via Tag Ratios (CETR) – a method to extract content text from diverse webpages by using the HTML document’s tag ratios. We describe how to comput...
This paper details the participation of the XLDB group from the University of Lisbon at the GeoCLEF task of CLEF 2006. We tested text mining methods that make use of an ontology t...
Bruno Martins, Nuno Cardoso, Marcirio Silveira Cha...
In this paper we propose a novel algorithm for opinion summarization that takes account of content and coherence, simultaneously. We consider a summary as a sequence of sentences ...