Tables are ubiquitous in web pages and scientific documents. With the explosive development of the web, tables have become a valuable information repository. Therefore, effective...
MapReduce is a popular framework for data-intensive distributed computing of batch jobs. To simplify fault tolerance, many implementations of MapReduce materialize the entire outp...
Tyson Condie, Neil Conway, Peter Alvaro, Joseph M....
Data warehousing is a software infrastructure which supports OLAP applications by providing a collection of tools which allow data extraction and cleaning, data integration and ag...
We study the problem of maintaining a sketch of recent elements of a data stream. Motivated by applications involving network data, we consider streams that are asynchronous, in wh...
Information extraction from text databases is a useful paradigm to populate relational tables and unlock the considerable value hidden in plain-text documents. However, information...