Redirection spam presents a web page with false content to a crawler for indexing, but automatically redirects the browser to a different web page. Redirection is usually immediat...
Scalability is the key issue in making content-based copy detection (CBCD) methods practical for very large image and video databases. Since copies are transformed versions of ori...
People often use powerful tools to manage the documents they encounter, but very rarely to store the mental knowledge they glean from those documents. Popcorn is a personal knowle...
Stephen Davies, Scotty Allen, Jon Raphaelson, Emil...
We present Opal, a light-weight framework for interactively locating missing web pages (http status code 404). Opal is an example of “in vivo” preservation: harnessing the col...
We present an approach for detecting link spam common in blog comments by comparing the language models used in the blog post, the comment, and pages linked by the comments. In co...