A distributed system is described that reliably mines parallel text from large corpora. The approach can be regarded as cross-language near-duplicate detection, enabled by an init...
Jakob Uszkoreit, Jay Ponte, Ashok C. Popat, Moshe ...
The ability to determine what day-to-day activity (such as cooking pasta, taking a pill, or watching a video) a person is performing is of interest in many application domains. A ...
Mike Perkowitz, Matthai Philipose, Kenneth P. Fish...
It is crucial for cross-language information retrieval (CLIR) systems to deal with the translation of unknown queries1 due to that real queries might be short. The purpose of this...
We present a novel method for discovering and modeling the relationship between informal Chinese expressions (including colloquialisms and instant-messaging slang) and their forma...
Set expansion refers to expanding a partial set of “seed” objects into a more complete set. One system that does set expansion is SEAL (Set Expander for Any Language), which e...