We propose an algorithm for extracting fields from HTML search results. The output of the algorithm is a database table– a data structure that better lends itself to high-level...
We present a discriminative model that directly predicts which set of phrasal translation rules should be extracted from a sentence pair. Our model scores extraction sets: nested ...
We demonstrate the effectiveness of multilingual learning for unsupervised part-of-speech tagging. The key hypothesis of multilingual learning is that by combining cues from multi...
Benjamin Snyder, Tahira Naseem, Jacob Eisenstein, ...
This paper describes a novel Bayesian approach to unsupervised topic segmentation. Unsupervised systems for this task are driven by lexical cohesion: the tendency of wellformed se...
Abstract. This paper presents a simple, yet effective method of building a codebook for pairs of spatially close SIFT descriptors. Integrating such codebook into the popular bag-o...