Passages can be hidden within a text to circumvent their disallowed transfer. Such release of compartmentalized information is of concern to all corporate and governmental organization. We explore the methodology to detect such hidden passages within a document. A document is divided into passages using various document splitting techniques, and a text classifier is used to categorize such passages. We present a novel document splitting technique called dynamic windowing, which significantly improves precision, recall and F1 measure. Categories and Subject Descriptors H.3.3 [Information Search and Retrieval]: Information Filtering General Terms Algorithm, Experimentation Keywords Passage Detection, Text Classification
Nazli Goharian, Saket S. R. Mengle