Abstract—Despite recent efforts to characterize complex networks such as citation graphs or online social networks (OSNs), little attention has been given to developing tools that can be used to characterize directed graphs in the wild, where no pre-processed data is available. The presence of hidden incoming edges but observable outgoing edges poses a challenge to characterize large directed graphs through crawling, as existing sampling methods cannot cope with hidden incoming links. The driving principle behind our random walk (RW) sampling method is to construct, in real-time, an undirected graph from the directed graph such that the random walk on the directed graph is consistent with one on the undirected graph. We then use the RW on the undirected graph to estimate the outdegree distribution. Our algorithm accurately estimates outdegree distributions of a variety of real world graphs. We also study the hardness of indegree distribution estimation when indegrees are latent (i.e....
Bruno F. Ribeiro, Pinghui Wang, Fabricio Murai, Do