Mining bilingual data (including bilingual sentences and terms1 ) from the Web can benefit many NLP applications, such as machine translation and cross language information retrie...
Long Jiang, Shiquan Yang, Ming Zhou, Xiaohua Liu, ...
User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's in...
Wouter Weerkamp, Krisztian Balog, Maarten de Rijke
Unsupervised word representations are very useful in NLP tasks both as inputs to learning algorithms and as extra word features in NLP systems. However, most of these models are b...
Eric H. Huang, Richard Socher, Christopher D. Mann...
We present a paradigm for uniting the diverse strands of XML-based Web technologies by allowing them to be incorporated within a single document. This overcomes the distinction be...
Social network-based systems usually suffer from two major limitations: they tend to rely on a single data source (e.g. email traffic), and the form of network patterns is often p...
Yevgeniy Eugene Medynskiy, Nicolas Ducheneaut, Aym...