With the need to make sense out of large and constantly growing information spaces, tools to support information management are becoming increasingly valuable. In prior work we pr...
We consider the problem of dust: Different URLs with Similar Text. Such duplicate URLs are prevalent in web sites, as web server software often uses aliases and redirections, and...
Collaborative Filtering (CF) algorithms, used to build webbased recommender systems, are often evaluated in terms of how accurately they predict user ratings. However, current eva...
Neal Lathia, Stephen Hailes, Licia Capra, Xavier A...
With the proliferation of online distribution methods for videos, content owners require easier and more effective methods for monetization through advertising. Matching advertis...
We study in this paper the Web forum crawling problem, which is a very fundamental step in many Web applications, such as search engine and Web data mining. As a typical user-crea...
Rui Cai, Jiang-Ming Yang, Wei Lai, Yida Wang, Lei ...