We study in this paper the Web forum crawling problem, which is a very fundamental step in many Web applications, such as search engine and Web data mining. As a typical user-crea...
Rui Cai, Jiang-Ming Yang, Wei Lai, Yida Wang, Lei ...
Technology in the field of digital media generates huge amounts of non-textual information, audio, video, and images, along with more familiar textual information. The potential f...
The Web is a dynamic, ever changing collection of information. This paper explores changes in Web content by analyzing a crawl of 55,000 Web pages, selected to represent different...
Eytan Adar, Jaime Teevan, Susan T. Dumais, Jonatha...
—Typical information extraction (IE) systems can be seen as tasks assigning labels to words in a natural language sequence. The performance is restricted by the availability of l...
Yanjun Qi, Pavel Kuksa, Ronan Collobert, Kunihiko ...
This paper uncovers a new phenomenon in web search that we call domain bias — a user’s propensity to believe that a page is more relevant just because it comes from a particul...
Samuel Ieong, Nina Mishra, Eldar Sadikov, Li Zhang