The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
Many databases have become Web-accessible through form-based search interfaces (i.e., HTML forms) that allow users to specify complex and precise queries to access the underlying ...
Hai He, Weiyi Meng, Yiyao Lu, Clement T. Yu, Zongh...
This article describes how to implement efficient memory resident path indexes for semi-structured data. Two techniques are introduced, and they are shown to be significantly fas...
Based on a systematic study of the semantics of temporal attributes of entities, this paper provides new guidelines for the design of temporal relational databases. The notions of ...
Searching and extracting meaningful information out of highly heterogeneous datasets is a hot topic that received a lot of attention. However, the existing solutions are based on e...