Word Clustering and Word Selection Based Feature Reduction for MaxEnt Based Hindi NER

14 years 1 months ago

Download www.aclweb.org

Statistical machine learning methods are employed to train a Named Entity Recognizer from annotated data. Methods like Maximum Entropy and Conditional Random Fields make use of features for the training purpose. These methods tend to overfit when the available training corpus is limited especially if the number of features is large or the number of values for a feature is large. To overcome this we proposed two techniques for feature reduction based on word clustering and selection. A number of word similarity measures are proposed for clustering words for the Named Entity Recognition task. A few corpus based statistical measures are used for important word selection. The feature reduction techniques lead to a substantial performance improvement over baseline Maximum Entropy technique.

Sujan Kumar Saha, Pabitra Mitra, Sudeshna Sarkar

Real-time Traffic

ACL 2008 | Computational Linguistics | Feature Reduction | Machine Learning Methods | Maximum Entropy |

claim paper

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	ACL
Authors	Sujan Kumar Saha, Pabitra Mitra, Sudeshna Sarkar

Comments (0)

Sciweavers

Word Clustering and Word Selection Based Feature Reduction for MaxEnt Based Hindi NER

ACL 2008 | Computational Linguistics | Feature Reduction | Machine Learning Methods | Maximum Entropy |

Explore & Download

Productivity Tools

Document Tools

Image Tools

Sciweavers