Script identification has always been a topic of much research interest in the field of document analysis. The accurate determination of the identity of the script is paramount to many post-processing steps such as document sorting, translation and in determining the choice of linguistic resources to use for OCR or handwriting recognition. However, few works exist with regards to the identification of online handwritten scripts, partly due to the large variations and challenges innate in handwritten scripts. This paper proposes a novel approach for online handwritten script identification based on the Information Retrieval model. We attempt to identify among three script families; Arabic, Roman and Tamil scripts, which attained an average accuracy of 93.3% from our results. This signifies promising potential in utilizing Information Retrieval models for script identification.
Guo Xian Tan, Christian Viard-Gaudin, Alex C. Kot