We address the problem of selecting nondomain-specific language model training data to build auxiliary language models for use in tasks such as machine translation. Our approach i...
Data Selection has emerged as a common issue in language technologies. We define Data Selection as the choosing of a subset of training data that is most effective for a given tas...
Jonathan Clark, Robert E. Frederking, Lori S. Levi...
Korean is an agglutinative language that does not have explicit word boundaries. It is also a highly inflective language that exhibits severe coarticulation effects. These charac...
Sakriani Sakti, Andrew M. Finch, Ryosuke Isotani, ...
In this paper, we report on a study that was performed within the "Semantics of History" project on how descriptions of historical events are realized in different types...
It is a traditional belief that in order to scale-up to more effective retrieval and access methods modern Information Retrieval has to consider more the text content. The modalit...
Roberto Basili, Alessandro Moschitti, Maria Teresa...