The aim of query-based sampling is to obtain a sufficient, representative sample of an underlying (text) collection. Current measures for assessing sample quality are too coarse gr...
: Feature selection methods are often applied in the context of document classification. They are particularly important for processing large data sets that may contain millions of...
Janez Brank, Dunja Mladenic, Marko Grobelnik, Nata...
The study of transportability aims to identify conditions under which causal information learned from experiments can be reused in a different environment where only passive obser...
The traditional crawlers used by search engines to build their collection of Web pages frequently gather unmodified pages that already exist in their collection. This creates unne...
— In this paper we consider the problem of collecting a large amount of data from several different hosts to a single destination in a wide-area network. Often, due to congestion...
William C. Cheng, Cheng-Fu Chou, Leana Golubchik, ...