There often exist multiple corpora for the same natural language processing (NLP) tasks. However, such corpora are generally used independently due to distinctions in annotation s...
Because English is a low morphology language, current statistical parsers tend to ignore morphology and accept some level of redundancy. This paper investigates how costly such re...
Matthew Honnibal, Jonathan K. Kummerfeld, James R....
Many existing methods for bilingual lexicon learning from comparable corpora are based on similarity of context vectors. These methods suffer from noisy vectors that greatly affec...
Unknown words are a major issue for large-scale grammars of natural language. We propose a machine learning based algorithm for acquiring lexical entries for all forms in the para...
So far, most Chinese natural language processing neglects the punctuations or oversimplifies their functi- ons. To improve the efficiency of Chinese similarity computing, this pap...