The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
Our work is motivated by the problem of ranking hyperlinked documents for a given query. Given an arbitrary directed graph with edge and node labels, we present a new flow-based ...
The Web is nowadays moving from a Web of data to a Web of services. In this paper we present our approach for Web Service discovery on Web scale, targeted to support flexible and ...
Parallel corpus is a rich linguistic resource for various multilingual text management tasks, including crosslingual text retrieval, multilingual computational linguistics and mul...
It is indispensable that the users surfing on the Internet could have web pages classified into a given topic as correct as possible. Toward this ends, this paper presents a topic-...
Sanguk Noh, Youngsoo Choi, Haesung Seo, Kyunghee C...