In this poster we present an overview of the techniques we used to develop and evaluate a text categorisation system for the PRINCIP project which sets out to automatically classify racist texts. Support Vector Machines (SVM) are used to automatically categorise web pages based on whether or not they are racist. Different interpretations of what constitutes a term are taken, and in this poster we look at a bag of words (BOW) vs. a bigram representation of a web page within a SVM. Keywords Text Categorisation/Classification, Machine Learning, Support Vector Machines.
Edel Greevy, Alan F. Smeaton