Keyword search has been recently extended to relational databases to retrieve information from text-rich attributes. However, all the existing methods focus on finding individual...
We investigates language models for informational and navigational web search. Retrieval on the web is a task that differs substantially from ordinary ad hoc retrieval. We perfor...
Structure analysis of table form documents is an important issue because a printed document and even an electronic document do not provide logical structural information but merely...
This paper presents results in automated genre classification of digital documents in PDF format. It describes genre classification as an important ingredient in contextualising s...
In this paper we propose to define a measure of visual similarity to compare different pages in a corpus. This measure is based on the analysis of the visual layout saliency of th...