We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled a...
Feng Jiao, Shaojun Wang, Chi-Hoon Lee, Russell Gre...
We present a computationally tractable account of the interactions between sentence markers and focus marking in Somali. Somali, as a Cushitic language, has a basic pattern wherei...
Accurately representing synonymy using distributional similarity requires large volumes of data to reliably represent infrequent words. However, the na
Japanese dependency structure is usually represented by relationships between phrasal units called bunsetsus. One of the biggest problems with dependency structure analysis in spo...
Developing better methods for segmenting continuous text into words is important for improving the processing of Asian languages, and may shed light on how humans learn to segment...
Sharon Goldwater, Thomas L. Griffiths, Mark Johnso...
In this paper we present how the automatic extraction of events from text can be used to both classify narrative texts according to plot quality and produce advice in an interacti...
Deterministic parsing guided by treebankinduced classifiers has emerged as a simple and efficient alternative to more complex models for data-driven parsing. We present a systemat...
A grammatical method of combining two kinds of speech repair cues is presented. One cue, prosodic disjuncture, is detected by a decision tree-based ensemble classifier that uses a...
John Hale, Izhak Shafran, Lisa Yung, Bonnie J. Dor...
Cross-language Text Categorization is the task of assigning semantic classes to documents written in a target language (e.g. English) while the system is trained using labeled doc...