This paper offers a novel look at using a dimensionalityreduction technique called simhash [8] to detect similar document pairs in large-scale collections. We show that this algo...
Most Twitter search systems generally treat a tweet as a plain text when modeling relevance. However, a series of conventions allows users to tweet in structural ways using combin...
Zhunchen Luo, Miles Osborne, Sasa Petrovic, Ting W...
—Although popular text search engines allow users to retrieve similar web pages, source code search engines do not have this feature. Detecting similar applications is a notoriou...
In this paper, we propose a Web based information sharing system called the Proxy Agent-based Information Sharing (PAIS). We also developed a writable Web mechanism called Web bro...
Recognizing the alternative ways people use to reference an entity, is important for many Web applications that query structured data. In such applications, there is often a mismat...