Random sampling is an appealing approach to build synopses of large data streams because random samples can be used for a broad spectrum of analytical tasks. Users are often inter...
Distributed Hash Tables (DHTs) provide a scalable solution for data sharing in P2P systems. To ensure high data availability, DHTs typically rely on data replication, yet without ...
Real-world data -- especially when generated by distributed measurement infrastructures such as sensor networks -- tends to be incomplete, imprecise, and erroneous, making it impo...
Schema matching is a critical problem for integrating heterogeneous information sources. Traditionally, the problem of matching multiple schemas has essentially relied on finding ...
Ranking is an important property that needs to be fully supported by current relational query engines. Recently, several rank-join query operators have been proposed based on rank...
Ihab F. Ilyas, Rahul Shah, Walid G. Aref, Jeffrey ...