A Unified Tagging Approach to Text Normalization

15 years 8 months ago

Download keg.cs.tsinghua.edu.cn

This paper addresses the issue of text normalization, an important yet often overlooked problem in natural language processing. By text normalization, we mean converting ‘informally inputted’ text into the canonical form, by eliminating ‘noises’ in the text and detecting paragraph and sentence boundaries in the text. Previously, text normalization issues were often undertaken in an ad-hoc fashion or studied separately. This paper first gives a formalization of the entire problem. It then proposes a unified tagging approach to perform the task using Conditional Random Fields (CRF). The paper shows that with the introduction of a small set of tags, most of the text normalization tasks can be performed within the approach. The accuracy of the proposed method is high, because the subtasks of normalization are interdependent and should be performed together. Experimental results on email data cleaning show that the proposed method significantly outperforms the approach of using cas...

Conghui Zhu, Jie Tang, Hang Li, Hwee Tou Ng, Tieju

Real-time Traffic

ACL 2007 | Computational Linguistics | Text Normalization | Text Normalization Issues | Text Normalization Tasks |

claim paper

» On Extendable Software Architecture for Spam Email Filtering

» ArnetMiner extraction and mining of academic social networks

» Machine Learning for Question Answering from Tabular Data

Post Info
More Details (n/a)

Added	29 Oct 2010
Updated	29 Oct 2010
Type	Conference
Year	2007
Where	ACL
Authors	Conghui Zhu, Jie Tang, Hang Li, Hwee Tou Ng, Tiejun Zhao

Comments (0)

Sciweavers

A Unified Tagging Approach to Text Normalization

ACL 2007 | Computational Linguistics | Text Normalization | Text Normalization Issues | Text Normalization Tasks |

Explore & Download

Productivity Tools

Sciweavers