We consider the problem of efficiently sampling Web search engine query results. In turn, using a small random sample instead of the full set of results leads to efficient approxi...
Aris Anagnostopoulos, Andrei Z. Broder, David Carm...
The University of California, Santa Cruz, Genome Browser Database (GBD) provides integrated sequence and annotation data for a large collection of vertebrate and model organism ge...
Donna Karolchik, Robert M. Kuhn, Robert Baertsch, ...
In this study, we formalize a multi-focal learning problem, where training data are partitioned into several different focal groups and the prediction model will be learned within...
Yong Ge, Hui Xiong, Wenjun Zhou, Ramendra K. Sahoo...
We describe a data mining system to detect frauds that are camouflaged to look like normal activities in domains with high number of known relationships. Examples include accounti...
Graphs or networks can be used to model complex systems. Detecting community structures from large network data is a classic and challenging task. In this paper, we propose a nove...