Real-world data -- especially when generated by distributed measurement infrastructures such as sensor networks -- tends to be incomplete, imprecise, and erroneous, making it impo...
Background: Recent progresses in genotyping technologies allow the generation high-density genetic maps using hundreds of thousands of genetic markers for each DNA sample. The ava...
Many modern enterprises are collecting data at the most detailed level possible, creating data repositories ranging from terabytes to petabytes in size. The ability to apply sophi...
Sudipto Das, Yannis Sismanis, Kevin S. Beyer, Rain...
Random sampling is one of the most fundamental data management tools available. However, most current research involving sampling considers the problem of how to use a sample, and...
Current semi-structured keyword search and natural language query processing systems use ad hoc approaches to take advantage of structural information. Although intuitive, they ar...