Sciweavers

CEAS
2004
Springer

Learning to Extract Signature and Reply Lines from Email

14 years 5 months ago
Learning to Extract Signature and Reply Lines from Email
: We describe methods for automatically identifying signature blocks and reply lines in plaintext email messages. This analysis has many potential applications, such as preprocessing email for text-to-speech systems; anonymization of email corpora; improving automatic content-based mail classifiers; and email threading. Our method is based on applying machine learning methods to a sequential representation of an email message, in which each email is represented as a sequence of lines, and each line is represented as a set of features. We compare several state-of-the-art sequential and non-sequential machine learning algorithms on different feature sets, and present experimental results showing that the presence of a signature block in a message can be detected with accuracy higher than 97%; that signature block lines can be identified with accuracy higher than 99%; and that signature block and reply lines can be simultaneously identified with accuracy of higher than 98%.
Vitor Rocha de Carvalho, William W. Cohen
Added 01 Jul 2010
Updated 01 Jul 2010
Type Conference
Year 2004
Where CEAS
Authors Vitor Rocha de Carvalho, William W. Cohen
Comments (0)