In this paper, we model the pair-wise similarities of a set of documents as a weighted network with a single cutoff parameter. Such a network can be thought of an ensemble of unweighted graphs, each consisting of edges with weights greater than the cutoff value. We look at this network ensemble as a complex system with a temperature parameter, and refer to it as a Latent Network. Our experiments on a number of datasets from two different domains show that certain properties of latent networks like clustering coefficient, average shortest path, and connected components exhibit patterns that are significantly divergent from randomized networks. We explain that these patterns reflect the network phase transition as well as the existence of a community structure in document collections. Using numerical analysis, we show that we can use the aforementioned network properties to predicts the clustering Normalized Mutual Information (NMI) with high correlation (ρ > 0.9). Finally we sho...
Vahed Qazvinian, Dragomir R. Radev