Sciweavers

ACL
2007

Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario

14 years 1 months ago
Automatic Part-of-Speech Tagging for Bengali: An Approach for Morphologically Rich Languages in a Poor Resource Scenario
This paper describes our work on building Part-of-Speech (POS) tagger for Bengali. We have use Hidden Markov Model (HMM) and Maximum Entropy (ME) based stochastic taggers. Bengali is a morphologically rich language and our taggers make use of morphological and contextual information of the words. Since only a small labeled training set is available (45,000 words), simple stochastic approach does not yield very good results. In this work, we have studied the effect of using a morphological analyzer to improve the performance of the tagger. We find that the use of morphology helps improve the accuracy of the tagger especially when less amount of tagged corpora are available.
Sandipan Dandapat, Sudeshna Sarkar, Anupam Basu
Added 29 Oct 2010
Updated 29 Oct 2010
Type Conference
Year 2007
Where ACL
Authors Sandipan Dandapat, Sudeshna Sarkar, Anupam Basu
Comments (0)