In a semi-structured database there is no clear separation between the data and the schema, and the degree to which it is structured depends on the application. Semi-structured da...
As the Web provides rich data embedded in the immense contents inside pages, we witness many ad-hoc efforts for exploiting fine granularity information across Web text, such as We...
This paper presents Carnegie Mellon University’s experiments on the mixed named-page and homepage finding task of the TREC 12 Web Track. Our results were strong; we achieved the...
We investigate the task of finding links from Wikipedia pages to external web pages. Such external links significantly extend the information in Wikipedia with information from ...
Abstract. In this paper, we present an extensive study of the cuttingplane algorithm (CPA) applied to structural kernels for advanced text classification on large datasets. In par...