With our participation in TREC2004, we test Terrier, a modular and scalable Information Retrieval framework, in three tracks. For the mixed query task of the Web track, we employ ...
In this paper, we consider the problem of extracting structured data from web pages taking into account both the content of individual attributes as well as the structure of pages...
Web pages are usually highly structured documents. In some documents, content with different functionality is laid out in blocks, some merely supporting the main discourse. In ot...
We describe an adaptive method for extracting records from web pages. Our algorithm combines a weighted tree matching metric with clustering for obtaining data extraction patterns...
Recent development of location technologies enables us to obtain the location history of users. This paper proposes a new method to infer users’ longterm properties from their r...