Efficient search in large textual collections with redundancy

15 years 1 months ago

Download www2007.org

Current web search engines focus on searching only the most recent snapshot of the web. In some cases, however, it would be desirable to search over collections that include many different crawls and versions of each page. One important example of such a collection is the Internet Archive, though there are many others. Since the data size of such an archive is multiple times that of a single snapshot, this presents us with significant performance challenges. Current engines use various techniques for index compression and optimized query execution, but these techniques do not exploit the significant similarities between different versions of a page, or between different pages. In this paper, we propose a general framework for indexing and query processing of archival collections and, more generally, any collections with a sufficient amount of redundancy. Our approach results in significant reductions in index size and query processing costs on such collections, and it is orthogonal to...

Jiangong Zhang, Torsten Suel

Real-time Traffic

Internet Technology | Query Execution | Query Processing Costs | Search Engine Query | WWW 2007 |

claim paper

Post Info
More Details (n/a)

Added	21 Nov 2009
Updated	21 Nov 2009
Type	Conference
Year	2007
Where	WWW
Authors	Jiangong Zhang, Torsten Suel

Comments (0)

Sciweavers

Efficient search in large textual collections with redundancy

Internet Technology | Query Execution | Query Processing Costs | Search Engine Query | WWW 2007 |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers