A low-effort data mining approach to labeling network event records in a WLAN is proposed. The problem being addressed is often observed in an AI and data mining strategy to network intrusion detection, i.e., need for a training dataset of network event records that are labeled as either normal or an intrusion type. Given the dynamic nature of intrusion detection, such a dataset is often very large in size, especially in a WLAN where several devices communicate with the network in a rather adhoc manner. The large size of such a training dataset adversely affects the effort required by the domain expert in labeling all the training dataset records. A clustering algorithm is initially used to form groups of similar network events, which the expert analyzes and assigns each cluster to one of four classes: definite intrusion, possibly intrusion, probably normal, and definite normal. An ensemble classifier is then used to cleanse the labeled dataset of likely mislabeling errors made by ...
Taghi M. Khoshgoftaar, Chris Seiffert, Naeem Seliy