The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
Low-dimensional topic models have been proven very useful for modeling a large corpus of documents that share a relatively small number of topics. Dimensionality reduction tools s...
Abstract. We present a keyphrase extraction algorithm for scientific publications. Different from previous work, we introduce features that capture the positions of phrases in docu...
Comparative evaluation of Machine Learning (ML) systems used for Information Extraction (IE) has suffered from various inconsistencies in experimental procedures. This paper repor...
Neil Ireson, Fabio Ciravegna, Mary Elaine Califf, ...
Software visualization has always been expensive, special purpose, and hard to program. Most of the existing software visualization tools require too much time for enduser develop...
Craig Anslow, James Noble, Stuart Marshall, Ewan D...