Duplicate URLs have brought serious troubles to the whole pipeline of a search engine, from crawling, indexing, to result serving. URL normalization is to transform duplicate URLs...
Tao Lei, Rui Cai, Jiang-Ming Yang, Yan Ke, Xiaodon...
In this paper, we cast the image-ranking problem into the task of identifying "authority" nodes on an inferred visual similarity graph and propose an algorithm to analyz...
Breaking news often contains timely definitions and descriptions of current terms, organizations and personalities. We utilize such web sources to construct definitions for such t...
Recent work on incremental crawling has enabled the indexed document collection of a search engine to be more synchronized with the changing World Wide Web. However, this synchron...
Lipyeow Lim, Min Wang, Sriram Padmanabhan, Jeffrey...
As computers and Internet become more and more available to families, access of objectionable graphics by children is increasingly a problem that many parents are concerned about....
James Ze Wang, Jia Li, Gio Wiederhold, Oscar Firsc...