: We describe our participation in the TREC 2008 Enterprise track and detail our language modeling-based approaches. For document search, our focus was on query expansion using pro...
This paper proposes a new algorithm that simultaneously identifies the coding system and language of a code string fetched from the Internet, especially World-Wide Web. The algori...
A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...
Although the World Wide Web has of late become an important source to consult for the meaning of words, a number of technical terms related to high technology are not found on the...
XML has become the most useful standard of data interchange in the web and e-business world and there is a large amount of information stored in this format. Nonetheless, a large ...