We present Opal, a light-weight framework for interactively locating missing web pages (http status code 404). Opal is an example of “in vivo” preservation: harnessing the col...
Similarity-based search has been a key factor for many applications such as multimedia retrieval, data mining, Web search and retrieval, and so on. There are two important issues r...
We introduce the Multiple Relationship Similarity Spreading Algorithm (MRSSA) to enhance IR effectiveness. This method has similarity computed in an iterative “spreading” fash...
Fully automatic methods that extract lists of objects from the Web have been studied extensively. Record extraction, the first step of this object extraction process, identifies...
In world wide web, a document is usually made up of multiple pages, each one of which has a unique URL address and links to each other by hyperlink pointers. Related documents are...