Content identification has many applications, ranging from preventing illegal sharing of copyrighted content on video sharing websites, to automatic identification and tagging of ...
This paper presents the Multiword Expression Toolkit (mwetoolkit), an environment for type and language-independent MWE identification from corpora. The mwetoolkit provides a targ...
Carlos Ramisch, Aline Villavicencio, Christian Boi...
The goal of the DARPA MADCAT (Multilingual Automatic Document Classification Analysis and Translation) Program is to automatically convert foreign language text images into Englis...
News articles about the same event published over time have properties that challenge NLP and IR applications. A cluster of such texts typically exhibits instances of paraphrase a...
There is a growing interest in intelligent assistants for a variety of applications from organizing tasks for knowledge workers to helping people with dementia. In this paper, we ...
Alan Fern, Sriraam Natarajan, Kshitij Judah, Prasa...