Sciweavers

68 search results - page 9 / 14
» Text extraction in complex color documents
Sort
View
CLEF
2010
Springer
13 years 7 months ago
A Textual-Based Similarity Approach for Efficient and Scalable External Plagiarism Analysis - Lab Report for PAN at CLEF 2010
In this paper we present an approach to detect external plagiarism based on textual similarity. This is an efficient and precise method that can be applied over large sets of docum...
Daniel Micol, Óscar Ferrández, Ferna...
SIGIR
2002
ACM
13 years 6 months ago
Unsupervised document classification using sequential information maximization
We present a novel sequential clustering algorithm which is motivated by the Information Bottleneck (IB) method. In contrast to the agglomerative IB algorithm, the new sequential ...
Noam Slonim, Nir Friedman, Naftali Tishby
WSDM
2010
ACM
215views Data Mining» more  WSDM 2010»
14 years 4 months ago
Boilerplate Detection using Shallow Text Features
In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...
Christian Kohlschütter, Peter Fankhauser, Wol...
IVC
2007
111views more  IVC 2007»
13 years 6 months ago
Colour text segmentation in web images based on human perception
There is a significant need to extract and analyse the text in images on Web documents, for effective indexing, semantic analysis and even presentation by non-visual means (e.g....
Dimosthenis Karatzas, Apostolos Antonacopoulos
WWW
2006
ACM
14 years 7 months ago
Relaxed: on the way towards true validation of compound documents
To maintain interoperability in the Web environment it is necessary to comply with Web standards. Current specifications of HTML and XHTML languages define conformance conditions ...
Jirka Kosek, Petr Nálevka