Documents formatted in eXtensible Markup Language (XML) are becoming increasingly available in collections of various document types. In this paper, we present an approach for the...
Massih-Reza Amini, Anastasios Tombros, Nicolas Usu...
We present a method for learning to find English to Chinese transliterations on the Web. In our approach, proper nouns are expanded into new queries aimed at maximizing the probab...
The Dirichlet compound multinomial (DCM) distribution, also called the multivariate Polya distribution, is a model for text documents that takes into account burstiness: the fact ...
Abstract. This paper describes an efficient method to construct reliable machine learning applications in peer-to-peer (P2P) networks by building ensemble based meta methods. We co...
Extracting titles from a PDFs full text is an important task in information retrieval to identify PDFs. Existing approaches apply complicated and expensive (in terms of calculating...