Researchers struggle to manage vast amounts of data coming from hundreds of sources in online repositories. To successfully conduct research studies, researchers need to find, ret...
David W. Embley, Stephen W. Liddle, Deryle W. Lons...
We study in this paper the Web forum crawling problem, which is a very fundamental step in many Web applications, such as search engine and Web data mining. As a typical user-crea...
Rui Cai, Jiang-Ming Yang, Wei Lai, Yida Wang, Lei ...
The non-English Web is growing at breakneck speed, but available language processing tools are mostly English based. Taxonomies are a case in point: while there are plenty of comm...
Xuerui Wang, Andrei Z. Broder, Evgeniy Gabrilovich...
Word space models, in the sense of vector space models built on distributional data taken from texts, are used to model semantic relations between words. We argue that the high dim...
Web search engines are facing formidable performance challenges due to data sizes and query loads. The major engines have to process tens of thousands of queries per second over t...