Sciweavers

281 search results - page 20 / 57
» Introducing the Enron Corpus
Sort
View
PVLDB
2008
141views more  PVLDB 2008»
13 years 7 months ago
WebTables: exploring the power of tables on the web
The World-Wide Web consists of a huge number of unstructured documents, but it also contains structured data in the form of HTML tables. We extracted 14.1 billion HTML tables from...
Michael J. Cafarella, Alon Y. Halevy, Daisy Zhe Wa...
CIKM
2010
Springer
13 years 6 months ago
Decomposing background topics from keywords by principal component pursuit
Low-dimensional topic models have been proven very useful for modeling a large corpus of documents that share a relatively small number of topics. Dimensionality reduction tools s...
Kerui Min, Zhengdong Zhang, John Wright, Yi Ma
ICADL
2007
Springer
89views Education» more  ICADL 2007»
13 years 11 months ago
Keyphrase Extraction in Scientific Publications
Abstract. We present a keyphrase extraction algorithm for scientific publications. Different from previous work, we introduce features that capture the positions of phrases in docu...
Thuy Dung Nguyen, Min-Yen Kan
ICML
2005
IEEE
14 years 8 months ago
Evaluating machine learning for information extraction
Comparative evaluation of Machine Learning (ML) systems used for Information Extraction (IE) has suffered from various inconsistencies in experimental procedures. This paper repor...
Neil Ireson, Fabio Ciravegna, Mary Elaine Califf, ...
VL
2008
IEEE
127views Visual Languages» more  VL 2008»
14 years 1 months ago
Towards end-user web software visualization
Software visualization has always been expensive, special purpose, and hard to program. Most of the existing software visualization tools require too much time for enduser develop...
Craig Anslow, James Noble, Stuart Marshall, Ewan D...