Recently, along with the rapid growth of the Web, the preservation efforts have also increased. As a consequence, large amounts of past Web data are stored in Web archives. This h...
This work aims to provide a novel, site-specific web page segmentation and section importance detection algorithm, which leverages structural, content, and visual information. The...
We describe a method for predicting query difficulty in a precision-oriented web search task. Our approach uses visual features from retrieved surrogate document representations (...
Eric C. Jensen, Steven M. Beitzel, David A. Grossm...
The iterative closest point (ICP) algorithm is widely used for the registration of geometric data. One of its main drawbacks is its quadratic time complexity O(N2 ) with the shape...
The Lixto project is an ongoing research effort in the area of Web data extraction. Whereas the project originally started out with the idea to develop a logic-based extraction lan...