It is crucial for cross-language information retrieval (CLIR) systems to deal with the translation of unknown queries1 due to that real queries might be short. The purpose of this...
Web text has been successfully used as training data for many NLP applications. While most previous work accesses web text through search engine hit counts, we created a Web Corpu...
Abstract. Parallel corpora are a valuable resource for machine translation, but at present their availability and utility is limited by genreand domain-speci city, licensing restri...
Originally conceived as a "naive" baseline experiment using traditional n-gram language models as classifiers, the NCLEANER system has turned out to be a fast and lightw...
Abstract. Due to the inherent difficulties associated with manual ontology building, knowledge acquisition and reuse are often seen as methods that can make this tedious process ea...
Elena Paslaru Bontas, David Schlangen, Thomas Schr...