Detecting anomalous records in categorical datasets

16 years 7 months ago

Download www.cs.cmu.edu

We consider the problem of detecting anomalies in high arity categorical datasets. In most applications, anomalies are defined as data points that are 'abnormal'. Quite often we have access to data which consists mostly of normal records, along with a small percentage of unlabelled anomalous records. We are interested in the problem of unsupervised anomaly detection, where we use the unlabelled data for training, and detect records that do not follow the definition of normality. A standard approach is to create a model of normal data, and compare test records against it. A probabilistic approach builds a likelihood model from the training data. Records are tested for anomalousness based on the complete record likelihood given the probability model. For categorical attributes, bayes nets give a standard representation of the likelihood. While this approach is good at finding outliers in the dataset, it often tends to detect records with attribute values that are rare. Sometim...

Kaustav Das, Jeff G. Schneider

Real-time Traffic

Arity Categorical Datasets | Complete Record Likelihood | Data Mining | KDD 2007 | Unlabelled Anomalous Records |

claim paper

» Using identity credential usage logs to detect anomalous service accesses

» Detection of Interdomain Routing Anomalies Based on HigherOrder Path Analysis

» Active Learning for Anomaly and RareCategory Detection

» Using treemaps for variable selection in spatiotemporal visualisation

Post Info
More Details (n/a)

Added	30 Nov 2009
Updated	30 Nov 2009
Type	Conference
Year	2007
Where	KDD
Authors	Kaustav Das, Jeff G. Schneider

Comments (0)

Sciweavers

Detecting anomalous records in categorical datasets

Arity Categorical Datasets | Complete Record Likelihood | Data Mining | KDD 2007 | Unlabelled Anomalous Records |

Explore & Download

Productivity Tools

Sciweavers