In this paper we use a Unified Relationship Matrix (URM) to represent a set of heterogeneous data objects (e.g., web pages, queries) and their interrelationships (e.g., hyperlinks, user clickthrough sequences). We claim that iterative computations over the URM can help overcome the data sparseness problem and detect latent relationships among heterogeneous data objects, thus, can improve the quality of information applications that require combination of information from heterogeneous sources. To support our claim, we present a unified similarity-calculating algorithm, SimFusion. By iteratively computing over the URM, SimFusion can effectively integrate relationships from heterogeneous sources when measuring the similarity of two data objects. Experiments based on a web search engine query log and a web page collection demonstrate that SimFusion can improve similarity measurement of web objects over both traditional content based algorithms and the cutting edge SimRank algorithm. Cate...
Wensi Xi, Edward A. Fox, Weiguo Fan, Benyu Zhang,