Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
clustering of documents according to sharing of topics at multiple levels of abstraction. Given a corpus of documents, a posterior inference algorithm finds an approximation to a ...
David M. Blei, Thomas L. Griffiths, Michael I. Jor...
We present a polytope-kernel density estimation (PKDE) methodology that allows us to perform exact mean-shift updates along the edges of the Delaunay graph of the data. We discuss...
XML is fast becoming the standard format to store, exchange and publish over the web, and is getting embedded in applications. Two challenges in handling XML are its size (the XML...
Paolo Ferragina, Fabrizio Luccio, Giovanni Manzini...
We define a deterministic metric of "well-behaved data" that enables searching along the lines of interpolation search. Specifically, define to be the ratio of distance...