Parallel web pages are important source of training data for statistical machine translation. In this paper, we present a new approach to sentence alignment on parallel web pages....
We investigate the automatic detection of sentences containing linguistic hedges using corpus statistics and syntactic patterns. We take Wikipedia as an already annotated corpus u...
Enriching speech recognition output with sentence boundaries improves its human readability and enables further processing by downstream language processing modules. We have const...
Yang Liu, Nitesh V. Chawla, Mary P. Harper, Elizab...
This paper describes our first participation in the Indian language sub-task of the main Adhoc monolingual and bilingual track in CLEF1 competition. In this track, the task is to...
Data Selection has emerged as a common issue in language technologies. We define Data Selection as the choosing of a subset of training data that is most effective for a given tas...
Jonathan Clark, Robert E. Frederking, Lori S. Levi...