Sciweavers

WWW
2011
ACM

Inverted index compression via online document routing

13 years 6 months ago
Inverted index compression via online document routing
Modern search engines are expected to make documents searchable shortly after they appear on the ever changing Web. To satisfy this requirement, the Web is frequently crawled. Due to the sheer size of their indexes, search engines distribute the crawled documents among thousands of servers in a scheme called local index-partitioning, such that each server indexes only several million pages. To ensure documents from the same host (e.g., www.nytimes.com) are distributed uniformly over the servers, for load balancing purposes, random routing of documents to servers is common. To expedite the time documents become searchable after being crawled, documents may be simply appended to the existing index partitions. However, indexing by merely appending documents, results in larger index sizes since document reordering for index compactness is no longer performed. This, in turn, degrades search query processing performance which depends heavily on index sizes. A possible way to balance quick d...
Gal Lavee, Ronny Lempel, Edo Liberty, Oren Somekh
Added 17 May 2011
Updated 17 May 2011
Type Journal
Year 2011
Where WWW
Authors Gal Lavee, Ronny Lempel, Edo Liberty, Oren Somekh
Comments (0)