Sciweavers

281 search results - page 24 / 57
» Introducing the Enron Corpus
Sort
View
ACL
1997
13 years 9 months ago
String Transformation Learning
String transformation systems have been introduced in (Brill, 1995) and have several applications in natural language processing. In this work we consider the computational proble...
Giorgio Satta, John C. Henderson
WWW
2006
ACM
14 years 8 months ago
Random sampling from a search engine's index
We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from the corpus of documents indexed by a search engine, using only the search...
Ziv Bar-Yossef, Maxim Gurevich
CLEF
2006
Springer
13 years 11 months ago
MSRA Columbus at GeoCLEF 2006
This paper describes the participation of Columbus Project of Microsoft Research Asia (MSRA) in the GeoCLEF 2006 (a cross-language geographical retrieval track which is part of Cr...
Zhisheng Li, Chong Wang 0002, Xing Xie, Xufa Wang,...
WWW
2007
ACM
14 years 8 months ago
Efficient search engine measurements
We address the problem of measuring global quality metrics of search engines, like corpus size, index freshness, and density of duplicates in the corpus. The recently proposed est...
Ziv Bar-Yossef, Maxim Gurevich
TSD
2007
Springer
14 years 1 months ago
On the Relative Hardness of Clustering Corpora
Abstract. Clustering is often considered the most important unsupervised learning problem and several clustering algorithms have been proposed over the years. Many of these algorit...
David Pinto, Paolo Rosso