This paper addresses a problem of natural language text alignment, from a humanities discipline called textual genetic criticism where different text versions must be compared. The...
Abstract. One important component of interactive systems is the generation component. While template-based generation is appropriate in many cases (for example, task oriented spoke...
We show that excluding outliers from the training data significantly improves kNN classifier, which in this case performs about 10% better than the best know method--Centroid-based...
Training a statistical machine translation starts with tokenizing a parallel corpus. Some languages such as Chinese do not incorporate spacing in their writing system, which creat...
This paper uses the URL word breaking task as an example to elaborate what we identify as crucialin designingstatistical natural language processing (NLP) algorithmsfor Web scale ...
Kuansan Wang, Christopher Thrasher, Bo-June Paul H...