The popularity of email has triggered researchers to look for ways to help users better organize the enormous amount of information stored in their email folders. One challenge that has not been studied extensively in text mining is the identification and reconstruction of hidden emails. A hidden email is an original email that has been quoted in at least one email in a folder, but does not present itself in the same folder. It may have been (un)intentionally deleted or may never have been received. The discovery and reconstruction of hidden emails is critical for many applications including email classification, summarization and forensics. This paper proposes a framework for reconstructing hidden emails using the embedded quotations found in messages further down the thread hierarchy. We evaluate the robustness and scalability of our framework by using the Enron public email corpus. Our experiments show that hidden emails exist widely in that corpus and also that our optimization te...
Giuseppe Carenini, Raymond T. Ng, Xiaodong Zhou