Abstract. Katakana, Japanese phonogram mainly used for loan words, is a troublemaker in Japanese word segmentation. Since Katakana words are heavily domaindependent and there are m...
Abstract. In this paper, we describe a new unsupervised sentence boundary detection system and present a comparative study evaluating its performance against different systems foun...
Jan Strunk, Carlos Nascimento Silla Jr., Celso A. ...
To take advantage of the ever-increasing volume of diagrams in electronic form, it is crucial that we have methods for parsing diagrams. Once a structured, content-based descripti...
The ADROIT system that we are developing allows automatic discourse analysis of information rich natural language texts extracted directly from the web. We use guidelines and rela...
In the early days of email, widely-used conventions for indicating quoted reply content and email signatures made it easy to segment email messages into their functional parts. To...