The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
This paper introduces the problem of matching people names to their corresponding social network identities such as their Twitter accounts. Existing tools for this purpose build u...
Gae-won You, Seung-won Hwang, Zaiqing Nie, Ji-Rong...
PCDB (http://www.pcdb.unq.edu.ar) is a database of protein conformational diversity. For each protein, the database contains the redundant compilation of all the corresponding cry...
Ezequiel I. Juritz, Sebastian Fernandez Alberti, G...
Document summarization plays an increasingly important role with the exponential growth of documents on the Web. Many supervised and unsupervised approaches have been proposed to ...
Liangda Li, Ke Zhou, Gui-Rong Xue, Hongyuan Zha, Y...
There is a significant need to extract and analyse the text in images on Web documents, for effective indexing, semantic analysis and even presentation by non-visual means (e.g....