Wikipedia is the largest monolithic repository of human knowledge. In addition to its sheer size, it represents a new encyclopedic paradigm by interconnecting articles through hyp...
Many applications require analyzing vast amounts of textual data, but the size and inherent noise of such data can make processing very challenging. One approach to these issues i...
David G. Underhill, Luke McDowell, David J. Marche...
We assess the current state of the art in speech summarization, by comparing a typical summarizer on two different domains: lecture data and the SWITCHBOARD corpus. Our results ca...
Large scale learning is often realistic only in a semi-supervised setting where a small set of labeled examples is available together with a large collection of unlabeled data. In...
We propose strategies to efficiently execute a query workload, which consists of multiple related queries submitted against a scientific dataset, on a distributed-memory system in...