This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
whose titles and abstracts sound very interesting, the pile of unread reports continues to grow on the table in my office." (How quaint the terminology: mail and electronic me...
This work evaluates a few search strategies for Arabic monolingual and cross-lingual retrieval, using the TREC Arabic corpus as the test-bed. The release by NIST in 2001 of an Ara...
As a principled approach to capturing semantic relations of words in information retrieval, statistical translation models have been shown to outperform simple document language m...
The Medical Article Records System (MARS) developed by the Lister Hill National Center for Biomedical Communications uses scanning, OCR and automated recognition and reformatting ...
Susan E. Hauser, Jonathan Schlaifer, Tehseen F. Sa...