A large fraction of the URLs on the web contain duplicate (or near-duplicate) content. De-duping URLs is an extremely important problem for search engines, since all the principal...
s: An Abstraction for Data Intensive Computing on Campus Grids Christopher Moretti, Hoang Bui, Karen Hollingsworth, Brandon Rich, Patrick Flynn, and Douglas Thain Department of Com...
Christopher Moretti, Hoang Bui, Karen Hollingswort...
In this paper, we describe methods to exploit search queries mined from search engine query logs to improve domain detection in spoken language understanding. We propose extending...
With the advent of open source software repositories the data available for defect prediction in source files increased tremendously. Although traditional statistics turned out t...
Mining feedback information from user click-through data is an important issue for modern Web retrieval systems in terms of architecture analysis, performance evaluation and algor...
Rongwei Cen, Yiqun Liu, Min Zhang, Bo Zhou, Liyun ...