Collection selection has been a research issue for years. Typically, in related work, precomputed statistics are employed in order to estimate the expected result quality of each ...
Matthias Bender, Sebastian Michel, Peter Triantafi...
Consumer health information written by health care professionals is often inaccessible to the consumers it is written for. Traditional readability formulas examine syntactic featu...
Trudi Miller, Gondy Leroy, Samir Chatterjee, Jie F...
Many government organizations publish a variety of data on the web to facilitate transparency. The multitude of sources has resulted in heterogeneous structures and formats as wel...
It is important yet hard to identify navigational queries in Web search due to a lack of sufficient information in Web queries, which are typically very short. In this paper we st...
Crawling the web is deceptively simple: the basic algorithm is (a) Fetch a page (b) Parse it to extract all linked URLs (c) For all the URLs not seen before, repeat (a)?(c). Howev...