Progress in insider-threat detection is currently limited by a lack of realistic, publicly available, real-world data. For reasons of privacy and confidentiality, no one wants to expose their sensitive data to the research community. Data can be sanitized to mitigate privacy and confidentiality concerns, but the mere act of sanitizing the data may introduce artifacts that compromise its utility for research purposes. If sanitization artifacts change the results of insider-threat experiments, then those results could lead to conclusions which are not true in the real world. The goal of this work is to investigate the consequences of sanitization artifacts on insider-threat detection experiments. We assemble a suite of tools and present a methodology for collecting and sanitizing data. We use these tools and methods in an experimental evaluation of an insiderthreat detection system. We compare the results of the evaluation using raw data to the results using each of three types of san...
Kevin S. Killourhy, Roy A. Maxion