Sciweavers

829 search results - page 152 / 166
» Minimal document set retrieval
Sort
View
WWW
2008
ACM
14 years 8 months ago
Performance of compressed inverted list caching in search engines
Due to the rapid growth in the size of the web, web search engines are facing enormous performance challenges. The larger engines in particular have to be able to process tens of ...
Jiangong Zhang, Xiaohui Long, Torsten Suel
WWW
2005
ACM
14 years 8 months ago
Extracting context to improve accuracy for HTML content extraction
Web pages contain clutter (such as ads, unnecessary images and extraneous links) around the body of an article, which distracts a user from actual content. Extraction of "use...
Suhit Gupta, Gail E. Kaiser, Salvatore J. Stolfo
WWW
2010
ACM
14 years 2 months ago
Not so creepy crawler: easy crawler generation with standard xml queries
Web crawlers are increasingly used for focused tasks such as the extraction of data from Wikipedia or the analysis of social networks like last.fm. In these cases, pages are far m...
Franziska von dem Bussche, Klara A. Weiand, Benedi...
DASFAA
2008
IEEE
188views Database» more  DASFAA 2008»
14 years 1 months ago
Summarization Graph Indexing: Beyond Frequent Structure-Based Approach
Graph is an important data structure to model complex structural data, such as chemical compounds, proteins, and XML documents. Among many graph data-based applications, sub-graph ...
Lei Zou, Lei Chen 0002, Huaming Zhang, Yansheng Lu...
PODS
2008
ACM
158views Database» more  PODS 2008»
14 years 7 months ago
Local Hoare reasoning about DOM
The W3C Document Object Model (DOM) specifies an XML update library. DOM is written in English, and is therefore not compositional and not complete. We provide a first step toward...
Philippa Gardner, Gareth Smith, Mark J. Wheelhouse...