Keeping track of changes in user interests from a document stream with a few relevance judgments is not an easy task. To tackle this problem, we propose a novel method that integr...
Parallel dataflow programs generate enormous amounts of distributed data that are short-lived, yet are critical for completion of the job and for good run-time performance. We ca...
Steven Y. Ko, Imranul Hoque, Brian Cho, Indranil G...
The increasing gap in processor and memory speeds has forced microprocessors to rely on deep cache hierarchies to keep the processors from starving for data. For many applications...
The Hadoop filesystem is a large scale distributed filesystem used to manage and quickly process extremely large data sets. We want to utilize Hadoop to assist with dataintensive ...
Morphbank, an on-line collection of museum-quality biological images, is an NSF funded project designed to facilitate the on-line collaboration of biologists from around the world...