Sciweavers

43 search results - page 5 / 9
» Creating a Persian-English Comparable Corpus
Sort
View
WWW
2006
ACM
14 years 8 months ago
Random sampling from a search engine's index
We revisit a problem introduced by Bharat and Broder almost a decade ago: how to sample random pages from the corpus of documents indexed by a search engine, using only the search...
Ziv Bar-Yossef, Maxim Gurevich
ACMSE
2009
ACM
14 years 2 months ago
Applying randomized projection to aid prediction algorithms in detecting high-dimensional rogue applications
This paper describes a research effort to improve the use of the cosine similarity information retrieval technique to detect unknown, known or variances of known rogue software by...
Travis Atkison
AI
2006
Springer
13 years 11 months ago
Unsupervised Named-Entity Recognition: Generating Gazetteers and Resolving Ambiguity
In this paper, we propose a named-entity recognition (NER) system that addresses two major limitations frequently discussed in the field. First, the system requires no human interv...
David Nadeau, Peter D. Turney, Stan Matwin
ACL
2010
13 years 5 months ago
Cross Lingual Adaptation: An Experiment on Sentiment Classifications
In this paper, we study the problem of using an annotated corpus in English for the same natural language processing task in another language. While various machine translation sy...
Bin Wei, Christopher Pal
ICDAR
2009
IEEE
14 years 2 months ago
Automated Ground Truth Data Generation for Newspaper Document Images
In document image understanding, public datasets with ground-truth are an important part of scientific work. They are not only helpful for developing new methods, but also provid...
Thomas Strecker, Joost van Beusekom, Sahin Albayra...