In this paper, we study the overall link-based spam structure and its evolution which would be helpful for the development of robust analysis tools and research for Web spamming a...
One of the most important steps in web crawling is determining the starting points, or seed selection. This paper identifies and explores the problem of seed selection in webscal...
New types of document collections are being developed by various web services. The service providers keep track of non-textual features such as click counts. In this paper, we pre...
Jiwoon Jeon, W. Bruce Croft, Joon Ho Lee, Soyeon P...
Current projects that automate the collection of provenance information use a centralized architecture for managing the resulting metadata - that is, provenance is gathered at rem...
Web pages contain a combination of unique content and template material, which is present across multiple pages and used primarily for formatting, navigation, and branding. We stu...