Protecting sensitive information while preserving the shareability and usability of data is becoming increasingly important. In call-centers a lot of customer related sensitive information is stored in audio recordings. In this work, we address the problem of protecting sensitive information in audio recordings and speech transcripts. We present a semi-supervised method to model sensitive information as a directed graph. Effectiveness of this approach is demonstrated by applying it to the problem of detecting and locating credit card transaction in real life conversations between agents and customers in a call center. Categories and Subject Descriptors H.2.8 [Database Applications]: Data Mining ; H.4.0 [Information Systems Applications]: General General Terms Algorithms, Experimentation Keywords Call Center Analytics, Clustering, Information Privacy
Tanveer A. Faruquie, Sumit Negi, Anup Chalamalla,