Search Sciweavers | Sciweavers

68 search results - page 9 / 14

» Text extraction in complex color documents

143

CLEF
2010
Springer

191views Information Technology» more CLEF 2010»

A Textual-Based Similarity Approach for Efficient and Scalable External Plagiarism Analysis - Lab Report for PAN at CLEF 2010

15 years 5 months ago

Download www.uni-weimar.de

In this paper we present an approach to detect external plagiarism based on textual similarity. This is an efficient and precise method that can be applied over large sets of docum...

Daniel Micol, Óscar Ferrández, Ferna...

claim paper

Read More »

126

click to vote

SIGIR
2002
ACM

152views Information Technology» more SIGIR 2002»

Unsupervised document classification using sequential information maximization

15 years 3 months ago

Download www.cs.huji.ac.il

We present a novel sequential clustering algorithm which is motivated by the Information Bottleneck (IB) method. In contrast to the agglomerative IB algorithm, the new sequential ...

Noam Slonim, Nir Friedman, Naftali Tishby

claim paper

Read More »

113

click to vote

WSDM
2010
ACM

215views Data Mining» more WSDM 2010»

Boilerplate Detection using Shallow Text Features

16 years 1 months ago

Download www.wsdm-conference.org

In addition to the actual content Web pages consist of navigational elements, templates, and advertisements. This boilerplate text typically is not related to the main content, ma...

Christian Kohlschütter, Peter Fankhauser, Wol...

claim paper

Read More »

139

click to vote

IVC
2007

111views more IVC 2007»

Colour text segmentation in web images based on human perception

15 years 3 months ago

Download www.cse.salford.ac.uk

There is a signiﬁcant need to extract and analyse the text in images on Web documents, for eﬀective indexing, semantic analysis and even presentation by non-visual means (e.g....

Dimosthenis Karatzas, Apostolos Antonacopoulos

claim paper

Read More »

136

click to vote

WWW
2006
ACM

135views Internet Technology» more WWW 2006»

Relaxed: on the way towards true validation of compound documents

16 years 4 months ago

Download www.medieq.org

To maintain interoperability in the Web environment it is necessary to comply with Web standards. Current specifications of HTML and XHTML languages define conformance conditions ...

Jirka Kosek, Petr Nálevka

claim paper

Read More »

« Prev « First page 9 / 14 Last » Next »

Sciweavers

Explore & Download

Productivity Tools

Sciweavers