Abstract. This work describes a methodology that can be used to identify structure and communication patterns within an organization based on e-mail data. The first step of the method is the construction of an e-mail graph; we then experimentally show that the adjacency matrix of the graph is well approximated by a low-rank matrix. The low-rank property indicates that Principal Component Analysis techniques may be used to remove the noise and extract the structural information (e.g. user communities, communication patterns, etc.). Furthermore, it is shown that the e-mail graph degree distribution (both with respect to indegrees and outdegrees) follows power laws; we also demonstrate that there exists a giant component connecting 70% of the nodes.
Petros Drineas, Mukkai S. Krishnamoorthy, Michael