Software’s expense owes partly to frequent reimplementation of similar functionality and partly to maintenance of patches, ports or components targeting evolving interfaces. Mor...
This paper proposes a fast and simple unsupervised word segmentation algorithm that utilizes the local predictability of adjacent character sequences, while searching for a leaste...
Many approaches to unsupervised morphology acquisition incorporate the frequency of character sequences with respect to each other to identify word stems and affixes. This typical...
Document clustering has many uses in natural language tools and applications. For instance, summarizing sets of documents that all describe the same event requires first identifyi...
Previous work on statistical language generation has primarily focused on grammaticality and naturalness, scoring generation possibilities according to a language model or user fe...