Sciweavers

563 search results - page 44 / 113
» Crawling the web for structured documents
Sort
View
CACM
1998
110views more  CACM 1998»
15 years 3 months ago
Viewing WISs as Database Applications
abstraction for modeling these problems is to view the Web as a collection of (usually small and heterogeneous) databases, and to view programs that extract and process Web data au...
Gustavo O. Arocena, Alberto O. Mendelzon
CIKM
2005
Springer
15 years 9 months ago
Structure-based query-specific document summarization
Summarization of text documents is increasingly important with the amount of data available on the Internet. The large majority of current approaches view documents as linear sequ...
Ramakrishna Varadarajan, Vagelis Hristidis
SEBD
2007
89views Database» more  SEBD 2007»
15 years 5 months ago
Disambiguation of Structure-Based Information in the STRIDER System
We present the current version of STRIDER1 , a versatile system for the disambiguation of structure-based information like XML schemas, structures of XML documents and web director...
Federica Mandreoli, Riccardo Martoglia, Enrico Ron...
CIKM
2009
Springer
15 years 10 months ago
Compact full-text indexing of versioned document collections
We study the problem of creating highly compressed fulltext index structures for versioned document collections, that is, collections that contain multiple versions of each docume...
Jinru He, Hao Yan, Torsten Suel
WWW
2011
ACM
14 years 10 months ago
Identifying primary content from web pages and its application to web search ranking
Web pages are usually highly structured documents. In some documents, content with different functionality is laid out in blocks, some merely supporting the main discourse. In ot...
Srinivas Vadrevu, Emre Velipasaoglu