User generated content is characterized by short, noisy documents, with many spelling errors and unexpected language usage. To bridge the vocabulary gap between the user's in...
Wouter Weerkamp, Krisztian Balog, Maarten de Rijke
Large scale efforts are underway to create dependency treebanks and parsers for Hindi and other Indian languages. Hindi, being a morphologically rich, flexible word order language...
Web search quality can vary widely across languages, even for the same information need. We propose to exploit this variation in quality by learning a ranking function on bilingua...
This paper introduces a new algorithm to parse discourse within the framework of Rhetorical Structure Theory (RST). Our method is based on recent advances in the field of statisti...
A number of studies have presented machine-learning approaches to semantic role labeling with availability of corpora such as FrameNet and PropBank. These corpora define the seman...