Random sampling is a well-known technique for approximate processing of large datasets. We introduce a set of algorithms for incremental maintenance of large random samples on seco...
As more and more information is available in the Electronic Health Record in the form of free-text narrative, there is a need for automated tools, which can process and understand...
Abstract-- In recent years, data streams have become ubiquitous because of advances in hardware and software technology. The ability to adapt conventional mining problems to data s...
A representation of the World Wide Web as a directed graph, with vertices representing web pages and edges representing hypertext links, underpins the algorithms used by web search...
Many valuable text databases on the web have non-crawlable contents that are "hidden" behind search interfaces. Metasearchers are helpful tools for searching over many s...