Search engines use content and link information to crawl, index, retrieve, and rank Web pages. The correlations between similarity measures based on these cues and on semantic ass...
In this paper, we present a novel near-duplicate document detection method that can easily be tuned for a particular domain. Our method represents each document as a real-valued s...
Hannaneh Hajishirzi, Wen-tau Yih, Aleksander Kolcz
The field of algorithms for pairwisc biosequence similarity search is dominated by heuristic methods of high efficiency but uncertain sensitivity. One reason that more formal stri...
Collaborative annotation tools are in widespread use. The metadata from these systems can be mined to induce semantic relationships among Web objects (sites, pages, tags, concepts...
Visual understanding is often based on measuring similarity between observations. Learning similarities specific to a certain perception task from a set of examples has been show...
Michael Bronstein, Alexander Bronstein, Nikos Para...