A Discriminative Candidate Generator for String Transformations

15 years 8 months ago

Download aclweb.org

String transformation, which maps a source string s into its desirable form t , is related to various applications including stemming, lemmatization, and spelling correction. The essential and important step for string transformation is to generate candidates to which the given string s is likely to be transformed. This paper presents a discriminative approach for generating candidate strings. We use substring substitution rules as features and score them using an L1-regularized logistic regression model. We also propose a procedure to generate negative instances that affect the decision boundary of the model. The advantage of this approach is that candidate strings can be enumerated by an efficient algorithm because the processes of string transformation are tractable in the model. We demonstrate the remarkable performance of the proposed method in normalizing inflected words and spelling variations.

Naoaki Okazaki, Yoshimasa Tsuruoka, Sophia Ananiad

Real-time Traffic

Candidate Strings | EMNLP 2008 | Natural Language Processing | Source String | String |

claim paper

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2008
Where	EMNLP
Authors	Naoaki Okazaki, Yoshimasa Tsuruoka, Sophia Ananiadou, Jun-ichi Tsujii

Sciweavers

A Discriminative Candidate Generator for String Transformations

Candidate Strings | EMNLP 2008 | Natural Language Processing | Source String | String |

Explore & Download

Productivity Tools

Sciweavers