Many web documents (such as JAVA FAQs) are being replicated on the Internet. Often entire document collections (such as hyperlinked Linux manuals) are being replicated many times....
General image retrieval is often carried out by a text-based search engine, such as Google Image Search. In this case, natural language queries are used as input to the search eng...
Many web applications use a mixture of HTML and scripting language code as the front-end to business services. Analogously to traditional applications, redundant code is introduce...
General purpose Web search engines are becoming ineffective due to the rapid growth and changes in the contents of the World Wide Web. Meta-search engines help a bit by having a b...
Leo Yuen, Matthew Chang, Ying Kit Lai, Chung Keung...
User clicks on a URL in response to a query are extremely useful predictors of the URL's relevance to that query. Exact match click features tend to suffer from severe data s...
Huihsin Tseng, Longbin Chen, Fan Li, Ziming Zhuang...