The widespread use of email has raised serious privacy concerns. A critical issue is how to prevent email information leaks, i.e., when a message is accidentally addressed to non-desired recipients. This is an increasingly common problem that can severely harm individuals and corporations — for instance, a single email leak can potentially cause expensive law suits, brand reputation damage, negotiation setbacks and severe financial losses. In this paper we present the first attempt to solve this problem. We begin by redefining it as an outlier detection task, where the unintended recipients are the outliers. Then we combine real email examples (from the Enron Corpus) with carefully simulated leak-recipients to learn textual and network patterns associated with email leaks. This method was able to detect email leaks in almost 82% of the test cases, significantly outperforming all other baselines. More importantly, in a separate set of experiments we applied the proposed method to...
Vitor R. Carvalho, William W. Cohen