Current relational databases require that a database schema exist prior to data entry and require manual optimization for best performance. We describe the query optimization tech...
Scott M. Meyer, Jutta Degener, John Giannandrea, B...
We present TwitterMonitor, a system that performs trend detection over the Twitter stream. The system identifies emerging topics (i.e. ‘trends’) on Twitter in real time and p...
HadoopDB is a hybrid of MapReduce and DBMS technologies, designed to meet the growing demand of analyzing massive datasets on very large clusters of machines. Our previous work ha...
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, the output of each MapReduce task and job is materialized to ...
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M....
We propose PASTE, the first differentially private aggregation algorithms for distributed time-series data that offer good practical utility without any trusted server. PASTE add...
Scientific data offers some of the most interesting challenges in data integration today. Scientific fields evolve rapidly and accumulate masses of observational and experiment...
Partha Pratim Talukdar, Zachary G. Ives, Fernando ...
The probabilistic threshold query (PTQ) is one of the most common queries in uncertain databases, where all results satisfying the query with probabilities that meet the threshold...
Maximal clique enumeration (MCE) is a fundamental problem in graph theory and has important applications in many areas such as social network analysis and bioinformatics. The prob...
James Cheng, Yiping Ke, Ada Wai-Chee Fu, Jeffrey X...